The CSV file here is about 500MB with 2.7 million rows. Through extensive testing, I have verified that for row in read_new:
quadruples memory consumption, and I can't understand why. The memory increase does not happen before or after the for
statement.
Can anyone shed some light on why this is happening?
I understand there are better ways to execute this script, but I have my reasons for doing it this way. I'm just trying to figure out why this is happening and if there is perhaps a more appropriate buffer to use than StringIO() for this purpose.
import io
import csv
import time
filename = 'rcs_batch_032519.csv'
csv_fob = open(filename, 'r')
fix_fob = io.StringIO()
reader = csv.reader(csv_fob)
writer = csv.writer(fix_fob)
for row in reader:
writer.writerow(row)
fix_fob.seek(0)
read_new = csv.reader(fix_fob)
# Memory explodes here, from 634MB to 2.36GB, after executing 'for' statement
for row in read_new:
time.sleep(30)
pass