Append data to HDF5 file with Pandas, Python

Question

I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.

HDFStore way:

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])

The other way I have tried it, to_hdf:

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf:

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.

P.S. sorry for the formatting of that table in here

MaxU - stand with Ukraine · Accepted Answer · 2019-01-23T12:04:40.187

17

pandas.HDFStore.put() has parameter append (which defaults to False) - that instructs Pandas to overwrite instead of appending.

So try this:

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True), but this file should also be created in a table format:

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE: appending works only for the table (format='t' - is an alias for format='table') format.

edited Jan 23 '19 at 12:04

answered Sep 13 '17 at 20:32

MaxU - stand with Ukraine

205,989
36
386
419

That is it, thank you very much for the quick answer ! – Karl Sep 13 '17 at 21:01
@Karl, glad i could help :) – MaxU - stand with Ukraine Sep 13 '17 at 21:02
could you add some explanation why `append` works and `put` does not? – Mayou36 Oct 20 '17 at 11:12
1

It's important to note that appending works only with `format = 'table'`. The default 'fixed' format is not-appendable (and not searchable) - see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html – Pawel Kranzberg Jan 23 '19 at 11:20
How can we ignore index while using `append`? – FinThusiast Apr 05 '22 at 08:53

score 3 · Answer 2 · answered May 22 '18 at 06:28

tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=True to specify that the data is to be appended to existing data if found under that key, instead of over-writing it.

Without this, the default is False and if it encounters an existing table under 'this_is_a_key' then it overwrites.

The mode= argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.

One file can have any number of keys, so a mode='a', append=False setting will mean only one key gets over-written while the other keys stay.

I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.

Append data to HDF5 file with Pandas, Python

2 Answers2

Linked