I am continuouisly writing simulation output data to a HDFStore that has become quite big (~15 GB). Additionally, I get the following performance warning:
/home/extern/fsalah/.local/lib/python2.7/site-packages/tables/group.py:501: PerformanceWarning: group ``/`` is exceeding the recommended maximum number of children (16384); be ready to see PyTables asking for *lots* of memory and possibly slow I/O.
Now, what I am experiencing is, that it takes around 30 sec to create a new child with a small dataset (100 rows, 4 columns). But this happens only, if I create it for the first time after opening the HDFStore and if that child does not exist already. After adding the first new child to this HDFStore, adding more children works fine (<0.1 sec). I can easily reproduce this behavior by closing and reopening the HDFStore. I am running the following code snippet:
databaseName = "store.hdf5"
store = pd.HDFStore(databaseName, complib='zlib', complevel=9)
timeslots = np.arange(0,100)
df = pd.DataFrame({'Timeslot': timeslots,
'a': [x[t] for t in timeslots],
'b': [y[t] for t in timeslots],
'c': np.repeat(z, (len(timeslots)))})
tableName = "runX"
store.put(tableName, df, data_columns=['Timeslot', 'a', 'b', 'c'])
#some more puts with different table names follow here
store.close()
Now my questions are:
Why do these perfomance issues show up only for the first time I am adding children and not every time?
What is the reason for that problem?
I experience the same issue - but scaled down (=~5sec) - for a HDFStore with a size of ~3GB and much less children (here I don't get any performance warnings).
[I searched for this topic but couldn't find any similar questions]