0

I'm making a small pipeline for chewing through a large amount of data, and I've decided to use python to call the program using multiple cores.

So here's my questions:

1) The program outputs a very big text file. I only wish to save the output to a new file (so not save the string as a python object), what's the best way to do this using the subprocess module?

2) I wish to call the program many times in parallel using the multiprocess module. I normally just go the simple way and use the Pool.map function, will this interfere with the subprocess module?

Thanks in advance!

Misconstruction
  • 1,839
  • 4
  • 17
  • 23

1 Answers1

4

1) The program outputs a very big text file. I only wish to save the output to a new file (so not save the string as a python object), what's the best way to do this using the subprocess module?

If you look at the documentation, valid values for stdout are:

PIPE, an existing file descriptor (a positive integer), an existing file object, and None.

So:

with open('new_file.txt', 'w') as outfile:
    subprocess.call(['program', 'arg'], stdout=outfile)

2) I wish to call the program many times in parallel using the multiprocess module. I normally just go the simple way and use the Pool.map function, will this interfere with the subprocess module?

Not unless you do certain odd things.

multiprocessing.Pool keeps track of which processes it created, and won't try to manage other child processes that happen to get created elsewhere, so the obvious thing you're worried about isn't an issue.

The most common problem I've seen is using Popen to create child processes that you never reap. You'll often get away with this in an app without multiprocessing, but as soon as you do the Popen-and-leak in a pool task, you stop getting away with it. (This isn't really anything about multiprocessing or Python; it's just that grandchild processes aren't the same as child processes.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks for the answer. Is there any particular reason for you using `with open`? – Misconstruction Sep 25 '13 at 22:31
  • @Misconstruction: You should almost _always_ use `with` when you use `open`. Otherwise, you need some other way to guarantee that `outfile.close()` gets called no matter what, even on an exception or an early return or whatever. (Especially with writable files, where failing to close the file may mean the last few lines never make it to disk.) In this particular case… there's really no risk, but even here, it saves a few keystrokes, and it means I don't have to _think_ about whether there's a risk. – abarnert Sep 25 '13 at 23:01