this post was submitted on 23 Jun 2023
68 points (100.0% liked)

Linux

48176 readers
663 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

cross-posted from: https://lemmy.run/post/15922

Running Commands in Parallel in Linux

In Linux, you can execute multiple commands simultaneously by running them in parallel. This can help improve the overall execution time and efficiency of your tasks. In this tutorial, we will explore different methods to run commands in parallel in a Linux environment.

Method 1: Using & (ampersand) symbol

The simplest way to run commands in parallel is by appending the & symbol at the end of each command. Here's how you can do it:

command_1 & command_2 & command_3 &

This syntax allows each command to run in the background, enabling parallel execution. The shell will immediately return the command prompt, and the commands will execute concurrently.

For example, to compress three different files in parallel using the gzip command:

gzip file1.txt & gzip file2.txt & gzip file3.txt &

Method 2: Using xargs with -P option

The xargs command is useful for building and executing commands from standard input. By utilizing its -P option, you can specify the maximum number of commands to run in parallel. Here's an example:

echo -e "command_1\ncommand_2\ncommand_3" | xargs -P 3 -I {} sh -c "{}" &

In this example, we use the echo command to generate a list of commands separated by newline characters. This list is then piped (|) to xargs, which executes each command in parallel. The -P 3 option indicates that a maximum of three commands should run concurrently. Adjust the number according to your requirements.

For instance, to run three different wget commands in parallel to download files:

echo -e "wget http://example.com/file1.txt\nwget http://example.com/file2.txt\nwget http://example.com/file3.txt" | xargs -P 3 -I {} sh -c "{}" &

Method 3: Using GNU Parallel

GNU Parallel is a powerful tool specifically designed to run jobs in parallel. It provides extensive features and flexibility. To use GNU Parallel, follow these steps:

  1. Install GNU Parallel if it's not already installed. You can typically find it in your Linux distribution's package manager.

  2. Create a file (e.g., commands.txt) and add one command per line:

    command_1
    command_2
    command_3
    
  3. Run the following command to execute the commands in parallel:

    parallel -j 3 < commands.txt
    

    The -j 3 option specifies the maximum number of parallel jobs to run. Adjust it according to your needs.

For example, if you have a file called urls.txt containing URLs and you want to download them in parallel using wget:

parallel -j 3 wget {} < urls.txt

GNU Parallel also offers numerous advanced options for complex parallel job management. Refer to its documentation for further information.

Conclusion

Running commands in parallel can significantly speed up your tasks by utilizing the available resources efficiently. In this tutorial, you've learned three methods for running commands in parallel in Linux:

  1. Using the & symbol to run commands in the background.
  2. Utilizing xargs with the -P option to define the maximum parallelism.
  3. Using GNU Parallel for advanced parallel job management.

Choose the method that best suits your requirements and optimize your workflow by executing commands concurrently.

you are viewing a single comment's thread
view the rest of the comments
[–] fatboy93@lemm.ee 3 points 1 year ago (2 children)

Love posts like this, because I can plug a tool that I revently found!

Its called ParaFly and i use it a lot on HPCs. Doesn't really have a multi-node support, but it also offers logging and resuming of jobs.

So your point 3 is essentially this: ParaFly -c commands.txt -CPU N where N is the number of jobs you want to run in parallel

[–] 30021190@lemmy.cloud.aboutcher.co.uk 2 points 1 year ago (1 children)

For anyone else reading this, please make sure this tool is correct for your HPC.

I would be annoyed at my users if they tried using any of these tools without fully understanding it fully and judging using the scheduler Vs paralellism correctly.

[–] fatboy93@lemm.ee 1 points 1 year ago

Absolutely! Sometimes its just easier for me to keep jobs in a single list and run them on a big fat node rather than array submit and block half the queue!

[–] root@lemmy.run 2 points 1 year ago (1 children)

Hmm I didn't know about ParaFly, so something I learned today as well 😀 .

[–] leds@feddit.dk 2 points 1 year ago

Or the good old make -j