6

The HDF5 format apparently does not support categoricals with format="fixed". The following example

s = pd.Series(['a','b','a','b'],dtype='category')
s.to_hdf('s.h5','s')

Returns the error:

NotImplementedError: Cannot store a category dtype in a HDF5 dataset that uses format="fixed". Use format="table".

How do I construct the categorical series with format='table'?

jpp
  • 159,742
  • 34
  • 281
  • 339
Autumn
  • 3,214
  • 1
  • 20
  • 35

1 Answers1

7

Specify format='table' or format='t' in pd.Series.to_hdf:

s.to_hdf('s.h5', key='s', format='t')

Note that this is also what the error message advises. As per the docs:

format : ‘fixed(f)|table(t)’, default is ‘fixed’

fixed(f) : Fixed format Fast writing/reading. Not-appendable, nor searchable

table(t) : Table format Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    OMG. I searched for a solid hour, trying to figure out how to table-format categories... *in the series*. But you're right, that's what error message says, now that I'm reading it correctly. – Autumn May 04 '18 at 14:17