I am trying to use pandas.read_sas()
to read binary compressed SAS files in chunks and save each chunk as a separate feather file.
This is my code
import feather as fr
import pandas as pd
pdi = pd.read_sas("C:/data/test.sas7bdat", chunksize = 100000, iterator = True)
i = 1
for pdj in pdi:
fr.write_dataframe(pdj, 'C:/data/test' + str(i) + '.feather')
i = i + 1
However I get the following error
ValueError Traceback (most recent call last) in () 1 i = 1 2 for pdj in pdi: ----> 3 fr.write_dataframe(pdj, 'C:/test' + str(i) + '.feather') 4 i = i + 1 5
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyarrow\feather.py in write_feather(df, dest) 116 writer = FeatherWriter(dest) 117 try: --> 118 writer.write(df) 119 except: 120 # Try to make sure the resource is closed
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyarrow\feather.py in write(self, df) 94 95 elif inferred_type not in ['unicode', 'string']: ---> 96 raise ValueError(msg) 97 98 if not isinstance(name, six.string_types):
ValueError: cannot serialize column 0 named SOME_ID with dtype bytes
I am using Windows 7 and Python 3.6. When I inspect it most the columns' cells are wrapped in b'cell_value'
which I assume to mean that the columns are in binary format.
I am a complete Python beginner so don't understand what is the issue?