1

I have 300 million rows and 3 columns in pandas.
I want to pivot this to a wide format. I estimate that the total memory of in the Current long format is 9.6 GB. I arrived at this by doing 300,000,000 * 3 * 8 bytes per "cell".

I want to convert to a wide format with 1.9 million rows * 1000 columns.

I estimate that it should take 15.2 GB.

When I pivot, the memory usage goes to 64gb (Linux resource monitor) and the swap gets used to 30gb and then The ipython kernel dies, which I am assuming is an out of memory related death.

Am I correct that during the generation of a pivot table the RAM usage will spike to more than the 64 GB of RAM that my desktop has? Why does generating a pivot table exceed system RAM?

user798719
  • 9,619
  • 25
  • 84
  • 123
  • Why are you estimating the size of your current table? That sounds like you have a csv, not a pandas table. In my experience, 2GB of csv data took ~14GB to load, and 3GB to maintain in memory after loading: and this was after optimizing. Without optimization I'd exceed 24GB of RAM/swap and python would crash. – TemporalWolf Aug 07 '17 at 09:44
  • you may want to take a loot at [this](https://stackoverflow.com/questions/29439589/how-to-create-a-pivot-table-on-extremely-large-dataframes-in-pandas) – Nullman Aug 07 '17 at 09:57

1 Answers1

0

If you're using DataFrame.pivot_table(), try using DataFrame.pivot() instead, it has much smaller memory consumption and is also faster. This solution is only possible if you're not using a custom aggregation function to construct you pivot table and if the tuple of columns you're pivoting on don't have redundant combinations.