I have a Pandas DataFrame in which the index is (notice the Freq: H) -
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2013-12-31 23:00:00]
Length: 26304, Freq: H, Timezone: None
There are multiple columns but the first few rows (and others scattered throughout) have all NA entries. If I write this to a HDF file thus:
hdfstore.put('/table', df, format='table', data_columns=True, append=False)
and then read it back with:
df = hdfstore['/table']
and look at the index, I see:
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-11 04:00:00, ..., 2013-12-31 23:00:00]
Length: 24656, Freq: None, Timezone: None
Notice that the Freq is now None and also that there are less rows and a later start date-time. The first row is now the first row of the original DataFrame that contains at least one non-NA column value.
Firstly, is this expected behaviour due to limitations of the HDF5 format and how DataFrames are stored, or a bug?
Is there a clean way to avoid this happening, or do I just need to 'fix' up the index after load. Not sure what the best way to do that is either.