I have an SPSS (.sav) file with over 90,000 columns and around 1800 rows. Previously I have used the code below (taken from this answer), which has worked well.
raw_data = spss.SavReader('largefile.sav', returnHeader = True)
raw_data_list = list(raw_data)
data = pd.DataFrame(raw_data_list)
data = data.rename(columns=data.loc[0]).iloc[1:]
However, now some of the columns contain special characters (including Chinese characters and accented characters). Using the documentation, it appears using ioUtf8=True
with SavReader
should achieve what I'm aiming to do. So I do the following:
raw_data = spss.SavReader('largefile.sav', returnHeader = True, ioUtf8=True)
raw_data_list = list(raw_data)
data = pd.DataFrame(raw_data_list)
data = data.rename(columns=data.loc[0]).iloc[1:]
Line 1 runs fine, but line 2 returns the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 6: invalid continuation byte
How can I get around the problem?