3

I have written a short python script to process my big fastq files in size from 5Gb to 35Gb. I am running the script in a Linux server that has many cores. The script is not written in parallel at all and taking about 10 minutes to finish for a single file in average.

If I run the same script on several files like

$ python my_script.py file1 & 
$ python my_script.py file2 & 
$ python my_script.py file3 & 

using the & sign to push back the process.

do those scripts run in parallel and will I save some time?

It seems not to me, since I am using top command to check the processor usage and each ones usage drops as I added new runs or shouldn't it use somewhere close 100% ?

So if they are not running in parallel, is there a way to make the os run them in parallel ?

Thanks for answers

Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
svural
  • 961
  • 1
  • 9
  • 17

2 Answers2

3

Commands executed this way do indeed run in parallel. The reason why they're not using up 100% of your CPU time might be because they're I/O bound, rather than CPU bound. The description of what the the script does ("big fastq files in size from 5Gb to 35Gb") suggests that this might just be the case.

If you look at the process list given by ps, though, you should see three python processes on there - unless one or more of them will have terminated by the time you run ps.

Daniel Kamil Kozar
  • 18,476
  • 5
  • 50
  • 64
2

Time spent in waiting on I/O operations is accounted as a different kind of CPU usage, usually %wa. You are probably just looking at the %us (user CPU time).

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186