Can the frequency of a Pandas tseries DatetimeIndex be preserved when writing to an HDFStore?

Question

I have a Pandas DataFrame in which the index is (notice the Freq: H) -

<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2013-12-31 23:00:00]
Length: 26304, Freq: H, Timezone: None

There are multiple columns but the first few rows (and others scattered throughout) have all NA entries. If I write this to a HDF file thus:

hdfstore.put('/table', df, format='table', data_columns=True, append=False)

and then read it back with:

df = hdfstore['/table']

and look at the index, I see:

<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-11 04:00:00, ..., 2013-12-31 23:00:00]
Length: 24656, Freq: None, Timezone: None

Notice that the Freq is now None and also that there are less rows and a later start date-time. The first row is now the first row of the original DataFrame that contains at least one non-NA column value.

Firstly, is this expected behaviour due to limitations of the HDF5 format and how DataFrames are stored, or a bug?

Is there a clean way to avoid this happening, or do I just need to 'fix' up the index after load. Not sure what the best way to do that is either.

One quick-and-dirty work-around is to just add a dummy column containing all 0s. Then, upon reload, the Freq of the DatetimeIndex is preserved. Obviously, that has unnecessary storage overhead. — DavidJ, May 07 '14 at 17:30

score 1 · Accepted Answer · answered May 07 '14 at 17:44

1

Their is an option introduced in 0.13.1 (might have been 0.13.0), where you can set dropna=False on a put/append to avoid dropping an all-NaN row. This is done for efficiency, as most of the time in say storing a Panel, you have lots of all-NaN rows, but no reason to store them.

Otherwise the frequency information will be preserved. Note that if you are appending the frequency information will NOT be preserved if you append multiple times.

You can always pd.infer_freq(an_index) if you need to re-infer the freqency (if possible). Normally this is done automatically in any event if needed.

answered May 07 '14 at 17:44

Jeff

125,376
21
220
187

Thanks Jeff - works like a charm (0.13.1). Now to documented it. – DavidJ May 07 '14 at 18:02
docs are here (for a different function), and the docstring: http://pandas-docs.github.io/pandas-docs-travis/io.html#multiple-table-queries. Would welcome a short blurb maybe in the beginning sections where table format is mentioned (in a warning/notes block). PR pls! – Jeff May 07 '14 at 18:19

Can the frequency of a Pandas tseries DatetimeIndex be preserved when writing to an HDFStore?

1 Answers1