4

I am using several generators inside of a heap queue to iterate through sorted files on disk. Often times the heapq does not completely drain before going out of scope, so the underlying generators will never reach a StopIteration condition.

I'd like to be able to attach a handler to the generator or some other elegant mechanism to delete the files on disk when the generator goes out of scope. The files themselves are temporary so it's fine to delete them. However if they're not deleted the program will ultimately fill up the disk with temporary files. Below is the generator for reference:

def _read_score_index_from_disk(file_name, buffer_size=8*10000):
    """Generator to yield a float/int value from a file, does buffering
    and file managment to avoid keeping file open while function is not
    invoked"""

    file_buffer = ''
    file_offset = 0
    buffer_offset = 1

    while True:
        if buffer_offset > len(file_buffer):
            data_file = open(file_name, 'rb')
            data_file.seek(file_offset)
            file_buffer = data_file.read(buffer_size)
            data_file.close()
            file_offset += buffer_size
            buffer_offset = 0
        packed_score = file_buffer[buffer_offset:buffer_offset+8]
        buffer_offset += 8
        if not packed_score:
            break
        yield struct.unpack('fi', packed_score)

I'm aware of the atexit handler, but it doens't work in my case since this code is to be used in a long running process.

Rich
  • 12,068
  • 9
  • 62
  • 94

2 Answers2

7

When generators go out of scope and is deleted, their generator.close() method is called, which in turn raises a GeneratorExit exception in your generator function.

Simply handle that exception:

def _read_score_index_from_disk(file_name, buffer_size=8*10000):
    # ...

    try:
        # generator loop
    except GeneratorExit:
        # clean up after the generator

If you use finally: rather than except GeneratorExit: then the block applies for any exception raised without catching those and when the generator naturally ended (as you don't have to handle the `GeneratorExit‘).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Sorry for the delay! I found myself googling this again and fancy finding my question and your answer I never marked accepted. I found that both the exception and `finally` block work, but I think I prefer the `finally` since that covers all the cases. – Rich Sep 29 '16 at 21:51
1

You could create a context manager out of a function to handle any clean-up tasks.

Here's a simple example of what I mean:

from contextlib import contextmanager

def my_generator():
    for i in range(10):
        if i > 5:
            break
        yield i

@contextmanager
def generator_context():
    yield my_generator()
    print("cleaning up")

with generator_context() as generator:
    for value in generator:
        print(value)

Output:

0
1
2
3
4
5
cleaning up
martineau
  • 119,623
  • 25
  • 170
  • 301