9

I'm using Pandas, and making a HDFStore object. I calculate 500 columns of data, and write it to a table format HDFStore object. Then I close the file, delete the data from memory, do the next 500 columns (labelled by an increasing integer), open up the store, and try to append the new columns. However, it doesn't like this. It gives me an error

invalid combinate of [non_index_axes] on appending data [[(1, [500, 501, 502, ...])]] vs current table [[(1, [0, 1, 2, ...])]]

I'm assuming it only allows appending of more rows not columns. So how do I add more columns?

StevenMurray
  • 742
  • 2
  • 7
  • 18
  • You'll have to add the columns to a new node. `store['node1'] = df1` then later `'store['node2'] = df2` – Zelazny7 Apr 11 '13 at 02:17
  • 8
    HDFStore (and HDF5 in general) are row oriented. You will want to append on rows and make it your longest dimension. As Zelazny7 indicates you can add columns by creating another node, keeping in mind that you need to keep these synchronized yourself (IOW they should have the same row indices), see: http://pandas.pydata.org/pandas-docs/dev/io.html#multiple-table-queries – Jeff Apr 11 '13 at 10:06
  • 2
    One workaround for this might be to store your dataframe transposed: write your 500 columns as 500 _rows_ instead, then append the next 500, and so on. When you read the dataframe back in, you'll just have to transpose it to get the format you expect. This seems less likely to produce errors than storing all columns separately. – Nathan Jul 18 '17 at 16:17
  • 1
    Please add code showing your issue to get a good usable answer. – snow_abstraction Apr 13 '19 at 09:49
  • 1
    @Nathan This is an interesting workaround but very bad idea unless columns are all of same type. If you have different types then transposing will mean all the columns have object type. – JohnE Mar 02 '21 at 16:38

2 Answers2

0

You have kept your column titles in the code [1, 2, 3, ...] and trying to append a DataFrame with different columns [500, 501, 502, ...].

tlentali
  • 3,407
  • 2
  • 14
  • 21
0

HDF5 files have a fixed structure, and you cannot easily add a column , but the workaround is to concatenate different DFs and the re-write them into the HDF5 file.

hdf5_files = ['data1.h5', 'data2.h5', 'data3.h5']

df_list = []
for file in hdf5_files:
    df = pd.read_hdf(file)
    df_list.append(df)

result = pd.concat(df_list)

# You can now use the result DataFrame to access all of the data from the HDF5 files

Does this solve your problem ?

Remind HDF5 is not designed for efficient append operations, you should consider database system if you need to frequently add new columns to your data , imho.

Lorenzo Bassetti
  • 795
  • 10
  • 15