0

This might not even be an issue but I've got a couple of related Python questions that will hopefully help clear up a bit of debugging I've been stuck on for the past week or two.

If you call a function that returns a large object, is there a way to ignore the return value in order to save memory?

My best example is, let's say you are piping a large text file, line-by-line, to another server and when you complete the pipe, the function returns a confirmation for every successful line. If you pipe too many lines, the returned list of confirmations could potentially overrun your available memory.

for line in lines:
    connection.put(line)
response = connection.execute()

If you remove the response variable, I believe the return value is still loaded into memory so is there a way to ignore/block the return value when you don't really care about the response?

More background: I'm using the redis-python package to pipeline a large number of set-additions. My processes occasionally die with out-of-memory issues even though the file itself is not THAT big and I'm not entirely sure why. This is just my latest hypothesis.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
Lucian Thorr
  • 1,997
  • 1
  • 21
  • 29
  • you could use generators to yield some data and perhaps save some memory? – PYA Jul 09 '17 at 01:31
  • 1
    I sincerely doubt this is the problem, however Python will free up memory as soon as that object has no pointers left to it. While you're right that `some_function_that_returns_a_large_object()` will allocate memory for its return, if you simply don't assign that to anything then Python will quickly garbage collect that object. – Adam Smith Jul 09 '17 at 03:07
  • 1
    Python objects are allocated on the heap, so by the time the function returns, it's too late. If you don't create a reference to it, the garbage collector will remove the object, but presumably in your case the OOM error would happen before the function completes. – C S Jul 09 '17 at 03:08
  • what exactly is your `lines` object? Did you actually load all the lines into memory? – C S Jul 09 '17 at 03:12
  • It's actually piped stdout from an lzop. Specifically, the `lines` is a `parse(data.readlines())` where the `data` is the piped stdout from a `subprocess.Popen(['lzop','Udcf', filename])` And then the parse() pulls out the relevant info from each line and returns it as a small tuple. – Lucian Thorr Jul 09 '17 at 12:50
  • I'm two weeks into debugging an issue where multiple servers running this script all fail with "out of memory" faults at the same time while writing to the same Redis cache. So my latest guess is the Redis cache must be returning something large to the writing servers when it's getting overwhelmed. I might just be making things up at this point though. – Lucian Thorr Jul 09 '17 at 13:08
  • 1
    It looks like you are reading all the lines into memory with `data.readlines()`. That's probably not a good idea here. You'll get a large list loaded into memory (actual memory allocated for backing array will be larger). You can use `data.readline()` (note singular) to return a line at a time or iterate directly through the lines in data using `in`. I'm afraid I can't shed much light beyond this. – C S Jul 09 '17 at 21:44

1 Answers1

0

I think the confirmation response is not big enough to overrun your memory. In python, when you read lines from file , the lines is always in memory which cost large memory resource.

GuangshengZuo
  • 4,447
  • 21
  • 27
  • your statement is misleading. The typical way of reading lines from a file is to use `line in ` which uses a generator. – C S Jul 09 '17 at 03:11