I have a problem in which I process documents from files using python generators. The number of files I need to process are not known in advance. Each file contain records which consumes considerable amount of memory. Due to that, generators are used to process records. Here is the summary of the code I am working on:
def process_all_records(files):
for f in files:
fd = open(f,'r')
recs = read_records(fd)
recs_p = (process_records(r) for r in recs)
write_records(recs_p)
My process_records
function checks for the content of each record and only returns the records which has a specific sender. My problem is the following: I want to have a count on number of elements being returned by read_records
. I have been keeping track of number of records in process_records
function using a list:
def process_records(r):
if r.sender('sender_of_interest'):
records_list.append(1)
else:
records_list.append(0)
...
The problem with this approach is that records_list
could grow without bounds depending upon the input. I want to be able to consume the content of records_list
once it grows to certain point and then restart the process. For example, after 20 records has been processed, I want to find out how many records are from 'sender_of_interest'
and how many are from other sources and empty the list. Can I do this without using a lock?