pandas group memory usage reduction

Question

Hello i have some code that is utilizing a high amount of memory regressor_df is a df that has over 14 million elements. when i remove the location from the group by the amount of ram needed to process goes down by about 26gb. how can i run this group by more efficiently or have the group by give back memory so that other parts can utilize system resources. this data is going into a machine learning model. regressor_prediction = regressor_df.groupby(['ds', 'item', 'location']).mean()[regressor].reset_index().fillna(0.)

i tried chunking the data but it still did not work. i can get it to pass the group by but the amount of ram that is being used is still high and it kills the script when other parts of the code begin to work other dataframes due to not having enough memory.

I don't have enough data so let me just suggest ideas: try `dask` (though it's a new syntax), or read smaller chunks of data (are you waiting till each chunks ends being processed?), or use a more efficient data structure, or use `agg`: `regressor_prediction = regressor_df.groupby(['ds', 'item', 'location']).agg({regressor: 'mean'}).reset_index().fillna(0.)` — Memristor, May 03 '23 at 01:35

pandas group memory usage reduction

0 Answers0