1

I have a Django web server running on an AWS EC2 instance with 1GB of RAM. When a certain request is made to the web server, I need to run an executable using subprocess.call('./executable'). The executable runs a Perl script which does some file I/O and then some computations on the data parsed from the files, nothing too crazy.

I began running into memory allocation issues which caused my web server to crash, so I messed around setting hard limits on the virtual memory allocated to each subprocess using ulimit -v some_value. I discovered that each subprocess requires around 100MB to run without erroring out, so it's no surprise that I'm running into memory issues with only 1GB of RAM.

I'm wondering, though, why this memory usage is so high. Is a lot of extra memory being allocated because I'm calling subprocess.call from a process that's running a web server which is memory-intensive? Is running an executable that runs a Perl script necessarily memory intensive because Perl has some overhead or something? Would it use much less memory if the Perl script were re-written in Python and run directly in the Django web server?

Would greatly appreciate any and all help on this one. Thanks!

Solomon
  • 290
  • 2
  • 12

1 Answers1

-1

There's been some great comments from people more knowledgable of the specifics of kernels and processes and memory than I! Check them out.

I don't really have a definitive answer for you but I can hopefully shed some light here:

The cause of the memory usage, and hence the out of memory exception, is explained at this SO anwer: Python memory allocation error using subprocess.Popen.

this is a very common problem in Unix, and actually has nothing to do with python or bioinformatics. A call to os.fork() temporarily doubles the memory of the parent process (the memory of the parent process must be available to the child process), before throwing it all away to do an exec(). While this memory isn't always actually copied, the system must have enough memory to allow for it to be copied, and thus if you're parent process is using more than half of the system memory and you subprocess out even "wc -l", you're going to run into a memory error.

You should look at implementing this perl script in Python and using that module/package in your view, avoid using another thread/process in your request-response cycle to handle this.

Another thing that may or may not be relevant, if this is a long running job or CPU intensive task, you should consider processing this in a background job using something like Celery or Python RQ. This keeps your server responding to requests fast and avoids a backlog of requests, so to avoid a situation where 20 requests are still being worked on because this long running task is doing its job and hence no-one else can access the server. The choice of using workers would depend on your needs, deadlines, etc.

A. J. Parr
  • 7,731
  • 2
  • 31
  • 46
  • 2
    That quoted passage is quite misleading. "A call to os.fork() temporarily doubles the memory of the parent process" means "The child process created by os.fork() will temporarily use as much memory as the parent process", but even that is misleading because it doesn't actually use up any memory at all; it's all shared with the parent process. – ikegami Sep 12 '18 at 06:29
  • I think "A call to os.fork() temporarily doubles the memory of the parent process (the memory of the parent process must be available to the child process)," describes exactly what you're saying at "it doesn't actually use up any memory at all; it's all shared with the parent process." – A. J. Parr Sep 12 '18 at 06:42
  • 3
    No, the size of the memory of the parent process doesn't change at all, but that passage makes it sound like it does. If the size of the parent is 50 MiB before the fork, you end up with a 50 MiB parent and a 50 MiB child that use 50 MiB total combined. After the exec, the child's usage could drop (say to 10 MiB), though the total will increase (to 60 MiB in our example). – ikegami Sep 12 '18 at 06:43
  • Ah, I think I understand your meaning now. The child process requires as much memory as the parent process, however the parent process does not allocate extra memory for itself? – A. J. Parr Sep 12 '18 at 06:47
  • 2
    fork doesn't allocate any memory at all. – ikegami Sep 12 '18 at 06:47
  • You are being very nit picky and I'm happy to improve my answer, but why do you say no memory is allocated? – A. J. Parr Sep 12 '18 at 06:50
  • @ARJMP - The memory used by the child is (initially) the _exact same memory_ used by the parent. The linux kernel supports "copy on write" memory, in which multiple processes can reference the same physical memory and it is only copied when one of them writes to that memory. As long as it's only being read, no additional memory is required. – Dave Sherohman Sep 12 '18 at 06:52
  • Ah I understand what your saying a bit better, thanks for that. The system does still need to check there is enough memory when you fork the process though, so I suppose it may not be allocated permanently, but without enough available memory the system will raise an OOM error when trying to fork the process? – A. J. Parr Sep 12 '18 at 07:04
  • 1
    It's not possible for there to not be enough available memory as forking takes no memory. – ikegami Sep 12 '18 at 08:01