3

I need your help about Queue memory. 1) I choose Queue as my data structure because I have one thread to feed data to the queue and another thread will take the data 2) The two threads designed to run for days 3) I don't want to limit the queue size, the size of queue could be really long, say ~10k which occupy 10GB. This is fine 4) Problem is when I finish shrink the q size by get() to only 20 items which occupy only ~100MB in memory. I print the size and I'm sure there is only 20 items. 5) But in system level, the whole process still occupy ~10GB

I tried to call

gc.collect()

by myself, memory doesn't change. So my wild guess is that those item from get() is destroyed. And the thread is always running and python would not decrease capacity of the queue.

My question is: Is there anyway to free those memory which the queue doesn't use for now? I can't find any api to do that.

Update 1

Ubuntu 16.04, python 2.7.12 I did some experiment today. My observation is that the q size is empty, however the system memory is occupied about 84M. Here is some code to reproduce my result.

First shoot: del

import Queue

q = Queue.Queue()
length = 10000000
buffer_size = 1000
index = 0
while index < length:
  q.put_nowait(1)
  index += 1
key = raw_input('finish insert, press key to pop')
while q.qsize() > buffer_size:
  a = q.get()
  del a
print 'after pop, q size = ', q.qsize()
raw_input('let me del the q')
del q
key = raw_input('finish delete')

Second shoot: clear()

import Queue

q = Queue.Queue()
length = 10000000
buffer_size = 1000
index = 0
while index < length:
  q.put_nowait(1)
  index += 1
key = raw_input('finish insert, press key to pop')
while q.qsize() > buffer_size:
  a = q.get()
  del a
print 'after pop, q size = ', q.qsize()
raw_input('let me del the q')
with q.mutex:
  q.queue.clear()
print 'q size = ', q.qsize()
key = raw_input('finish delete')

Third shoot: Queue()

import Queue

q = Queue.Queue()
length = 10000000
buffer_size = 1000
index = 0
while index < length:
  q.put_nowait(1)
  index += 1
key = raw_input('finish insert, press key to pop')
while q.qsize() > buffer_size:
  a = q.get()
  del a
print 'after pop, q size = ', q.qsize()
raw_input('let me del the q')
q = Queue.Queue()
print 'q size = ', q.qsize()
key = raw_input('finish delete')

Fourth shoot: gc.collect()

import Queue
import gc

q = Queue.Queue()
length = 10000000
buffer_size = 1000
index = 0
while index < length:
  q.put_nowait(1)
  index += 1
key = raw_input('finish insert, press key to pop')
while q.qsize() > buffer_size:
  a = q.get()
  del a
print 'after pop, q size = ', q.qsize()
raw_input('let me del the q')
#del q
#with q.mutex:
#  q.queue.clear()
q = Queue.Queue()
print 'q size = ', q.qsize()
raw_input('let me gc.collect')
gc.collect()
raw_input('how about now?')

These four ways would not release the memory in the queue.Can anyone tell me what I'm doing wrong? Many thanks!

Some thought

Seem like python Queue will reserve the largest memory capacity in its life circle and try to re-use the memory without malloc memory. Compared with data structure in C++ stl vector as example. Double the memory when the (size == capacity) and reduce capacity to half if the (size / capacity == 0.25). I expect the dynamic data structure will have this feature. Is there any way I could do that? Or the python queue is designed by this way?

yuan
  • 171
  • 2
  • 7
  • Python has a reference-counting garbage collector. If the elements from the queue are still used somewhere in your program, they are not getting garbage collected. – Stephane Martin Dec 28 '17 at 23:35
  • Have you any code to show? – Eineki Dec 28 '17 at 23:35
  • Hi, thank you for reply! I just add some test code. I tested it on my Ubuntu 16.04 with python 2.7.12. Can you see what I'm doing wrong here? – yuan Dec 29 '17 at 17:36
  • 1
    Having the same problem. Did you figure out how to clear the memory? – Mike Azatov Sep 28 '20 at 13:58
  • It may be a kind of Numpy version issue. https://stackoverflow.com/questions/54419043/memory-use-if-multiprocessing-queue-is-not-used-by-two-separate-processes – user17324030 Nov 04 '21 at 02:24
  • Was this issue resolved by any chance? @Basil Musa @yuan I'm having the same problem while using `multiprocessing` library and using `mp.Queue`. I'm surprised that even setting maxsize does not help. – vish4071 Apr 11 '22 at 22:20

2 Answers2

0

Call q.task_done() after q.get()

ref: https://docs.python.org/3/library/queue.html#queue.Queue.task_done

Besides, see this: https://bugs.python.org/issue43911

Sven Eberth
  • 3,057
  • 12
  • 24
  • 29
  • The memory issue is not related to task_done. I experimented with both calling task_done and not calling it. No difference in memory. Memory gets freed at the end. – Basil Musa Jul 13 '21 at 16:46
0

I came across similar code (Python 2.7.5) when debugging a memory leak issue. Passing a size parameter to the Queue constructor will help in practical scenarios where producer and consumer threads work at similar speeds and a very large size is not required:

q = Queue.Queue(100)
pdp
  • 4,117
  • 1
  • 17
  • 20