1

I am using the subprocess.popen() function to run an external tool that reads & writes a lot of data (>GB) to stdout. However, I'm finding that the kernel is killing the python process when it runs out of memory:

Out of memory: Kill process 8221 (python) score 971 or sacrifice child
Killed process 8221 (python) total-vm:8532708kB, anon-rss:3703912kB, file-rss:48kB

Since I know I'm handling a large amount of data I've set up popen to write stdout and stderr to files so I'm not using pipes. My code looks something like this:

errorFile = open(errorFilePath, "w")
outFile = open(outFilePath, "w")
#Use Popen to run the command
try:                
    procExecCommand = subprocess.Popen(commandToExecute, shell=False, stderr=errorFile, stdout=outFile)
    exitCode = procExecCommand.wait()

except Exception, e:
    #Write exception to error log       
    errorFile.write(str(e))     

errorFile.close()
outFile.close()        

I've tried changing the shell parameter to True and setting the bufsize parameter = -1 also with no luck.

I've profiled the memory running this script and via bash and I see big spike in the memory usage when running via Python than compared to bash.

I'm not sure what exactly Python is doing to consume so much more memory than the just using bash unless it has something to with trying to write the output to the file? The bash script just pipes the output to a file.

I initially found that my swap space was quite low so I increased it and that helped initially, but as the volume of data grows then I start running out of memory again.

So is there anything with Python I can do to try and handle these data volumes better, or is it just a case of recommending more memory with plenty of swap space. That or jettison Python altogether.

System details:

  • Ubuntu 12.04
  • Python 2.7.3
  • The tool I'm running is mpileup from samtools.
craigb
  • 41
  • 1
  • 4
  • You could try to run the process as `Popen("myprocess -arg > output",shell=True)`. i.e. send the exact string you would use in bash into `Popen` with `shell=True`. – mgilson Jul 17 '12 at 14:21

2 Answers2

1

The problem might be that your are using the wait() method (as in procExecCommand.wait()) which tries to run the subprocess to completion and then returns. Try the approach used in this question, which uses e.g. stdout.read() on the process handle. This way you can regularly empty the pipes, write to files, and there should be no build-up of memory.

Community
  • 1
  • 1
ThomasH
  • 22,276
  • 13
  • 61
  • 62
  • With `stderr=errorFile, stdout=outFile`, `errorFile` and `outfile` being regular `open()`ed files, there is no `procExecCommand.stdout`... – glglgl Jul 17 '12 at 14:47
0

What kind of output your process generates, maybe the clue is in that.

Warning : The script won't terminate, you have to kill it.

This sample setup works as expected for me.

import subprocess

fobj = open("/home/tst//output","w")

subprocess.Popen("/home/tst//whileone",stdout=fobj).wait()

And whileone

#!/bin/bash

let i=1
while [ 1 ]
do
 echo "We are in iteration $i"
 let i=$i+1
 usleep 10000
done
tuxuday
  • 2,977
  • 17
  • 18