Read zipped txt file as pandas dataframe

Question

I am trying to read a zipped txt file as pandas dataframe. Though the format of file after unzipping is txt, but it contains comma separated values.

Following the answer from here, I used:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',')
print(df.head())

But it is throwing this error:

ParserError: Error tokenizing data. C error: Expected 37 fields in line 23, saw 80

I am using python 3.6 with pandas version 0.24.2. Would upgrading pandas help?

check the `sep=','` in the txt file, as the error shows line 23 does not follow this separation. — Himanshu, Mar 01 '21 at 12:37
@Himanshu yes, that's happening because initial rows contain less columns while later rows contain more. I tried using `usecols` argument but it didn't help — Ank, Mar 01 '21 at 12:49
You should control the unzipped data. Showing here the first rows and the row 23 could help. — Serge Ballesta, Mar 01 '21 at 12:50

score 0 · Accepted Answer · answered Mar 01 '21 at 16:04

So this was happening because of irrregular number of columns in various rows, and since I don't want to drop any data, I used the names argument with maximum number of columns to fix the issue like so:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',', names=range(80))
print(df.head())

Read zipped txt file as pandas dataframe

1 Answers1