I have sensor data recorded over a timespan of one year. The data is stored in twelve chunks, with 1000 columns, ~1000000 rows each. I have worked out a script to concatenate these chunks to one large file, but about half way through the execution I get a MemoryError
. (I am running this on a machine with ~70 GB of usable RAM.)
import gc
from os import listdir
import pandas as pd
path = "/slices02/hdf/"
slices = listdir(path)
res = pd.DataFrame()
for sl in slices:
temp = pd.read_hdf(path + f"{sl}")
res = pd.concat([res, temp], sort=False, axis=1)
del temp
gc.collect()
res.fillna(method="ffill", inplace=True)
res.to_hdf(path + "sensor_data_cpl.hdf", "online", mode="w")
I have also tried to fiddle with HDFStore
so I do not have to load all the data into memory (see Merging two tables with millions of rows in Python), but I could not figure out how that works in my case.