pandas to_csv writing keeps consuming more memory until it crashes

Question

UPDATE: I have realized that every new run was creating a new Python console which was causing more memory consumption. I had to turn of the setting that creates new console for each run. This feature automatically got enabled when i upgraded to Pycharm pro for some reason. Now, memory consumption is steady.

My project creates a csv named 'pressure_drop' and I want to create a new pandas dataframe using the code below. The pressure_drop.csv in this example has 10150 rows and 12 columns. As you can see, I am deleting some columns that don't need to be shown and then creating a data frame by assigning row and column index. Finally, it is written to a new .csv file that is more readable that I will use to create interactive charts etc.

The problem is, Python takes up more memory space every time the code is run in the console and Python ends up crashing if the code is run enough number of times. Can you help me understand why this is happening?

For example, Python takes up ~100 more MB's every time the code is run for the data set above.

import pandas as pd

def data_frame_creator(result_array):
    array = results_csv_loader(result_array)
    array = np.delete(array,[3,4,5,6,7],1)
    len = array.shape
    row_count = len[0] +1
    df = pd.DataFrame(data = array, index=[np.arange(1,row_count)], columns=columns.dataframe_columns)
    df.to_csv('Output.csv')

data_frame_creator('pressure_drop.csv')

Don't say 'Python' when you mean 'pandas', in particular `pd.to_csv()`. Also, obviously don't try to run multiple instances writing the same file at the same time. — smci, Apr 29 '20 at 23:16
**Please show us a 5-line snippet of `result_array`** (post as text, not screenshot). Without you posting `result_array`, [the usual standard advice applies about handling huge CSVs](https://stackoverflow.com/a/40820097/202229): specify the dtype for each category, so you don't waste huge amounts of memory by reading e.g. datetimes as unique strings. — smci, Apr 29 '20 at 23:22

score 0 · Answer 1 · answered Apr 26 '20 at 17:30

0

It's a little hard to know what you're trying to do without knowing what the dataframes look like and what columns you want. Perhaps the function you're looking for is read_csv? E.g.:

input_df = pd.read_csv('pressure_drop.csv', use_cols=[1,2,8,9,10,11,12])

answered Apr 26 '20 at 17:30

Alexandre Daly

320
1
7

I think i found the culprit. Every time i run the code it creates a new console tab in PyCharm. I am trying to find the setting that disable that feature. Memory usage goes back down when i close the tabs manually. – bachree Apr 26 '20 at 17:50

pandas to_csv writing keeps consuming more memory until it crashes

1 Answers1