I have 300 million rows and 3 columns in pandas.
I want to pivot this to a wide format.
I estimate that the total memory of in the
Current long format is 9.6 GB.
I arrived at this by doing
300,000,000 * 3 * 8 bytes per "cell".
I want to convert to a wide format with 1.9 million rows * 1000 columns.
I estimate that it should take 15.2 GB.
When I pivot, the memory usage goes to 64gb (Linux resource monitor) and the swap gets used to 30gb and then The ipython kernel dies, which I am assuming is an out of memory related death.
Am I correct that during the generation of a pivot table the RAM usage will spike to more than the 64 GB of RAM that my desktop has? Why does generating a pivot table exceed system RAM?