1

Running into problems when trying to send a dataframe to hdf5 in small chunks via pd.HDFStore('mystore.h5', mode='a').append(my_frame, chunk). The chunks are all the same in terms of columns and types (they come from the same dataframe) But It works for a lot of chunks then bombs half way through.

ValueError: cannot match existing table structure for [Net_Bal_Amt,Loan_Current_Rate] on appending data

I print out the dataframe chunks that caused this fail, the one thing they have in common is all 'None' values for a specific column (they are originally null from the source). Not sure how to correct this. They should stay None or NaN or null, as long as they are empty. Thanks.

Traceback (most recent call last):
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3381, in create_axes
    b, b_items = by_items.pop(items)
KeyError: ('Net_Bal_Amt', 'Loan_Current_Rate')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]\crd_test.py", line 8, in <module>
    credit.CRD.hdf_install(overwrite=True, tablenames=['loans_uscrd', 'loans_uscrd_a'])
  File "[...]\credit_base.py", line 62, in hdf_install
    cls._hdf_creation(map_)
  File "[...]\credit_base.py", line 80, in _hdf_creation
    cls._hdf_processing(v, chunk)
  File "[...]\credit_base.py", line 88, in _hdf_processing
    cls.crd.append(frame, chunk)   
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 903, in append
    **kwargs)
  File "[...]\lib\site-packages\pandas\io\pytables.py", line 1259, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "[...]\lib\site-packages\pandas\io\pytables.py", line 3751, in write
    **kwargs)
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3388, in create_axes
    item in items))
ValueError: cannot match existing table structure for [Net_Bal_Amt,Loan_Current_Rate] on appending data

dtypes:

pd.read_hdf(r'[...]\crd_test.h5','loans').dtypes
Out[4]: 
Customer_Id                  object
As_of_Date           datetime64[ns]
Net_Bal_Amt                 float64
Loan_Current_Rate           float64
dtype: object

versions: pytables:3.1.1 pandas: 0.15.2 python:3.4

dtypes of chunk being appended on crash:

Customer_Id                  object
As_of_Date           datetime64[ns]
Net_Bal_Amt                 float64
Loan_Current_Rate            object
dtype: object
asdf
  • 836
  • 1
  • 12
  • 29
  • 1
    show the existing store dtypes, eg ``pd.read_hdf('store.h5','df').dtypes`` and what you are are appending. – Jeff Jun 23 '15 at 23:24
  • further show pandas,python,PyTables versions – Jeff Jun 23 '15 at 23:24
  • updated per request. Can't show data being appended though. – asdf Jun 24 '15 at 20:34
  • you need to show ``df.dtypes`` for what you are appending as well. – Jeff Jun 24 '15 at 21:34
  • well there u go - your loan column is object and the stored one is not – Jeff Jun 24 '15 at 22:24
  • Any idea why it changes mid way? its pulled from an oracle db and stored in a generator frame then pushed to the hdf. Can i resolve this by adjusting Loan_Current_Rate to object? – asdf Jun 24 '15 at 22:28
  • you don't want thinks as object unless they are strings; you can try astype(float) or better yet use read_sql which will infer types – Jeff Jun 24 '15 at 22:34
  • Thanks, i'll try that. But do you know WHY it changes mid way? the dataframe is pulled from oracle via. pd.read_sql_table... doesn't that infer types??? – asdf Jun 25 '15 at 13:23
  • well since I can't see what you are doing, no idea. you might have it incorrectly type in the db. – Jeff Jun 25 '15 at 13:43
  • In the db all the numeric types are NUMBER. as you suggested, `chunk = chunk[[column names]].asfloat()` works, but takes a serious performance hit. I think the issue here IS that read_sql_table INFERS types - a column in a chunk that happens to have all ints may be set to type int (or object in the case above) automatically, when it really should be float64 like all the previous chunks. Is there an elegant way to turn off the infer types? – asdf Jun 26 '15 at 22:02
  • well, you might have nulls or something. ``.astype`` should be quite fast. You have something else going on. – Jeff Jun 26 '15 at 22:32

0 Answers0