2

I have a python script A.pyand it takes arguments with a target file with some list of IPs and outs a CSV file with Information found regarding the IPs from some sources.( Run Method : python A.py Input.txt -c Output.csv ).

It took ages to get the work done. Later,I split Input file ( split -l 1000 Input.txt) -> created directories ( 10 directories) -> executed the script with the Input splitted in 10 directories parallel in screen mode

How to do this kind of jobs efficiently ? Any suggestions please ?

Arun
  • 1,160
  • 3
  • 17
  • 33

1 Answers1

1

Try this:

parallel --round --pipepart -a Input.txt --cat python A.py {} -c {#}.csv

If A.py can read from a fifo then this is more efficient:

parallel --round --pipepart -a Input.txt --fifo python A.py {} -c {#}.csv

If your disk has long seek times then it might be faster to use --pipe instead of --pipepart.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • @ ole Thanks for reply.I cannot see any process running.I get some prompt on the screen " parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit " and What does {#}.csv mean ! Does it mean any CSV file in the directory ? – Arun Dec 02 '15 at 08:58
  • Are you being hit by http://stackoverflow.com/questions/16448887/gnu-parallel-not-working-at-all – Ole Tange Dec 02 '15 at 16:21
  • {#} is the substitution string for the job number. So it will pass 1.csv to the first job, 2.csv to the next and so on. – Ole Tange Dec 02 '15 at 16:22