3

I am trying to use pandas.read_sas() to read binary compressed SAS files in chunks and save each chunk as a separate feather file.

This is my code

import feather as fr
import pandas as pd

pdi = pd.read_sas("C:/data/test.sas7bdat", chunksize = 100000, iterator = True)

i = 1
for pdj in pdi:
    fr.write_dataframe(pdj, 'C:/data/test' + str(i) + '.feather')
    i = i + 1

However I get the following error

ValueError Traceback (most recent call last) in () 1 i = 1 2 for pdj in pdi: ----> 3 fr.write_dataframe(pdj, 'C:/test' + str(i) + '.feather') 4 i = i + 1 5

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyarrow\feather.py in write_feather(df, dest) 116 writer = FeatherWriter(dest) 117 try: --> 118 writer.write(df) 119 except: 120 # Try to make sure the resource is closed

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyarrow\feather.py in write(self, df) 94 95 elif inferred_type not in ['unicode', 'string']: ---> 96 raise ValueError(msg) 97 98 if not isinstance(name, six.string_types):

ValueError: cannot serialize column 0 named SOME_ID with dtype bytes

I am using Windows 7 and Python 3.6. When I inspect it most the columns' cells are wrapped in b'cell_value' which I assume to mean that the columns are in binary format.

I am a complete Python beginner so don't understand what is the issue?

xiaodai
  • 14,889
  • 18
  • 76
  • 140
  • I don't know anything about feather, but I'd double check that you are successfully converting from SAS to pandas before trying to write out to feather. The ability to read SAS into pandas is pretty great, but definitely works < 100% of the time – JohnE Oct 13 '17 at 02:24
  • I can write out as csv. So must be working – xiaodai Oct 13 '17 at 04:06

1 Answers1

1

Edit: looks like this was a bug patched in a recent version: https://issues.apache.org/jira/browse/ARROW-1672 https://github.com/apache/arrow/commit/238881fae8530a1ae994eb0e283e4783d3dd2855

Are the column names strings? Are you sure pdj is of type pd.DataFrame?

Limitations

Some features of pandas are not supported in Feather:

Non-string column names

Row indexes

Object-type columns with non-homogeneous data

techvslife
  • 2,273
  • 2
  • 20
  • 26