1

Using GNU parallel, I am trying to run a sub-sampling script that inputs two files and outputs a specific subsampled file. I am using this command:

parallel -j+0 --eta python sub_sample_.2.py ::: file1 file2 ::: file3 file4 ::: file5 file6

But there's no ETA on the command line, i.e.:

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left 8 AVG:0.00s local:8/0/1005/0.0

Also only the first four files are processed, but not the last two: file5 and file6.

agc
  • 7,973
  • 2
  • 29
  • 50
Labrat
  • 105
  • 10

1 Answers1

1
parallel -j+0 --eta python sub_sample_.2.py ::: file1 file2 ::: file3 file4 ::: file5 file6

2*2*2 = 8 jobs in total.

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left 8 AVG:0.00s local:8/0/1005/0.0

The ETA is computed on the runtime of jobs that finished. Here no jobs have finished yet, so there is no ETA. You can also see all 8 jobs are running on your local system, so you likely have 8 or more cores.

Also only the first four files are processed, but not the last two: file5 and file6.

Written this way I suspect you might not be aware of what multiple ::: do. Run --dryrun and see if that is what you expect will be run.

My guess is that what you really want to run is (requires version 20160422 or later):

parallel --eta python sub_sample_.2.py ::: file1 file3 file5 :::+ file2 file4 file6

Or:

parallel --xapply --eta python sub_sample_.2.py ::: file1 file3 file5 ::: file2 file4 file6
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Thanks. I was not aware that jobs had to finish in order to give an ETA. Just to be clear using :::+ is the same as using --xapply? – Labrat Jun 23 '16 at 14:01
  • @Labrat The :::+ is a more advanced --xapply, but in this situation they do the same. – Ole Tange Jun 24 '16 at 11:32
  • If I am using a command that takes an output and input argument such as: samtools sort -n How would I go about this in parallel? – Labrat Jun 29 '16 at 14:19
  • parallel samtools sort -n {}.out {} ::: input*files – Ole Tange Jun 29 '16 at 15:13
  • One last thing. When trying to run: parallel --eta htseq-count -m intersection-nonempty -i Name -s reverse -f bam {} >{}_htseq_Counts.txt 2>{}_OUTPUT_WARNINGS_.txt ::: *.bam it is writing the eta to "{}_OUTPUT_WARNINGS_.txt" and all the counts to "{}_htseq_Counts.txt" instead of seperate files. any help with this? – Labrat Jun 29 '16 at 18:46
  • Try this: parallel --eta htseq-count -m intersection-nonempty -i Name -s reverse -f bam {} '>{}_htseq_Counts.txt 2>{}_OUTPUT_WARNINGS_.txt' ::: *.bam – Ole Tange Jul 01 '16 at 22:48