4

 a) I want to run 2 scripts in parallel

b) I want to my for loops within those scripts in parallel.

Before I had this code:

for year in 2000 2001 2002 2003; do

  echo $year" LST data being merged"

  cd $base_data_dir/$year

  # this is the part that takes a long time
  cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc

done

I wanted to use GNU Parallel to try and run this in parallel.

I tried the following:

a) Create a 'controller' script that calls other scripts

b) pass in an array as arguments to GNU parallel

The controller script

# 1. Create monthly LST for each year

cd $working_dir
seq 2000 2003 | parallel 'bash create_yearly_LST_files.sh {}'

# 2. Create monthly NDVI for each year

cd $working_dir
seq 2000 2003 | parallel 'bash create_yearly_NDVI_files.sh {}'

This should be running the following in parallel:

bash create_yearly_LST_files.sh 2000
bash create_yearly_LST_files.sh 2001
...

bash create_yearly_NDVI_files.sh 2000
bash create_yearly_NDVI_files.sh 2001
...

The processing script (the same for NDVI)

year="$1"
echo $year" LST data being merged"
cd $base_data_dir/$year

cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc

So the commands should read:

cd $base_data_dir/2000
cdo -f nc2 mergetime *.nc $output_dir/LST_2000.nc

cd $base_data_dir/2001
cdo -f nc2 mergetime *.nc $output_dir/LST_2001.nc
...

cd $base_data_dir/2000
cdo -f nc2 mergetime *.nc $output_dir/NDVI_2000.nc

cd $base_data_dir/2001
cdo -f nc2 mergetime *.nc $output_dir/NDVI_2001.nc
...

My Question:

The processes still work in my new code but there was no performance speed up.

Can anyone help me understand how to pass each year to be run in parallel?

And also run both of the scripts in parallel (create_yearly_LST_files.sh and create_yearly_NDVI_files.sh)

Community
  • 1
  • 1
Tommy Lees
  • 1,293
  • 3
  • 14
  • 34

3 Answers3

3

With GNU Parallel:

cd $working_dir
parallel 'cd {}; cdo -f nc2 mergetime *.nc xxx/LST_{}.nc' ::: {2000..2003}
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Hi @Mark will the empty curly brackets in `cd {}` have to be filled in with `cd{$base_data_dir/$year}`? Apologies bash is somewhat new to me! – Tommy Lees Jun 20 '18 at 07:55
  • 2
    No. As each one of the parallel jobs is started, the `{}` is filled in (by **GNU Parallel**) with the current parameter, so it will fill it in with 2000 for the first job, 2001 for the second and so on. – Mark Setchell Jun 20 '18 at 08:30
  • 1
    You can use `parallel --dry-run ...` if you want to see what it would execute without actually executing anything. – Mark Setchell Jun 20 '18 at 08:56
  • 1
    You can use `man parallel` to see the built-in help, and press SPACEBAR to move forwards by a page and `q` to quit. – Mark Setchell Jun 20 '18 at 08:58
2

What is stopping you from doing

for year in 2000 2001 2002 2003; do

  echo $year" LST data being merged"

  cd $base_data_dir/$year

  # this is the part that takes a long time
  cdo -f nc2 mergetime *.nc $output_dir/LST_$year.nc &

done
wait
jeremysprofile
  • 10,028
  • 4
  • 33
  • 53
  • does the `&` mean that the later `$year`'s in the loop will also be run on different cores? – Tommy Lees Jun 19 '18 at 16:22
  • 1
    `&` puts the task in the background; it means it doesn't wait for the command to finish before going on. `wait` at the bottom waits for all the background tasks to finish before continuing with your script. Linux is smart and will use different cores, yes. – jeremysprofile Jun 19 '18 at 16:24
  • where do other `echo` statements get written to? Say if I had a line `echo $year " LST data merged"` where would that be written? – Tommy Lees Jun 19 '18 at 16:31
  • All echo statements are written to the same place, stdout. – jeremysprofile Jun 19 '18 at 16:33
  • will the `wait` command stop me from running the next process in parallel too (being `bash create_yearly_NDVI_files.sh`)? I have updated the question to include this further complication. – Tommy Lees Jun 19 '18 at 16:50
  • No. It will only wait for background processes spawned by the current script. – jeremysprofile Jun 19 '18 at 16:53
  • That's amazing thankyou for simplifying my problem! Do you have any recommendations for tightening up my question so others can read it? – Tommy Lees Jun 19 '18 at 16:56
  • Delete it? Wherever you want something to run in parallel, just use `&` and `wait`. That includes the controller script. This is already on SO in many places; there is little point in retaining your question for posterity. – jeremysprofile Jun 19 '18 at 16:59
  • do you mind if I incorporate your answer into a complete answer of how it was fixed? – Tommy Lees Jun 19 '18 at 17:03
  • 1
    Yes, I mind. I answered your original question and you neither upvoted nor accepted my answer. – jeremysprofile Jun 19 '18 at 17:21
  • Hi @jeremysprofile I'm very sorry I only just got the privileges to up vote but have done it immediately. I didn't know how to accept the answer! I'm new to stack overflow but thank you so much for your help. – Tommy Lees Jun 19 '18 at 21:24
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173447/discussion-between-tommy-lees-and-jeremysprofile). – Tommy Lees Jun 20 '18 at 07:52
1

Maybe this will work:

doit() {
  cd "$base_data_dir"/"$1"
  cdo -f nc2 mergetime *.nc "$output_dir"/${2}_${1}.nc"
}
export -f doit
export base_data_dir
export output_dir
parallel doit ::: {2000..2018} ::: LST NDVI
Ole Tange
  • 31,768
  • 5
  • 86
  • 104