Parallel Processing Issue in Python

Question

I have a python script A.pyand it takes arguments with a target file with some list of IPs and outs a CSV file with Information found regarding the IPs from some sources.( Run Method : python A.py Input.txt -c Output.csv ).

It took ages to get the work done. Later,I split Input file ( split -l 1000 Input.txt) -> created directories ( 10 directories) -> executed the script with the Input splitted in 10 directories parallel in screen mode

How to do this kind of jobs efficiently ? Any suggestions please ?

you can use python threads for your task – Alex Kashin Dec 01 '15 at 06:17 — Alex Kashin, Dec 01 '15 at 06:17
https://docs.python.org/2/library/thread.html – Alex Kashin Dec 01 '15 at 06:17 — Alex Kashin, Dec 01 '15 at 06:17
Thanks Alex will have a look into it ! – Arun Dec 01 '15 at 08:17 — Arun, Dec 01 '15 at 08:17

score 1 · Accepted Answer · answered Dec 01 '15 at 19:57

1

Try this:

parallel --round --pipepart -a Input.txt --cat python A.py {} -c {#}.csv

If A.py can read from a fifo then this is more efficient:

parallel --round --pipepart -a Input.txt --fifo python A.py {} -c {#}.csv

If your disk has long seek times then it might be faster to use --pipe instead of --pipepart.

answered Dec 01 '15 at 19:57

Ole Tange

31,768
5
86
104

@ ole Thanks for reply.I cannot see any process running.I get some prompt on the screen " parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit " and What does {#}.csv mean ! Does it mean any CSV file in the directory ? – Arun Dec 02 '15 at 08:58
Are you being hit by http://stackoverflow.com/questions/16448887/gnu-parallel-not-working-at-all – Ole Tange Dec 02 '15 at 16:21
{#} is the substitution string for the job number. So it will pass 1.csv to the first job, 2.csv to the next and so on. – Ole Tange Dec 02 '15 at 16:22

Parallel Processing Issue in Python

1 Answers1