0

I am trying to read a zipped txt file as pandas dataframe. Though the format of file after unzipping is txt, but it contains comma separated values.

Following the answer from here, I used:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',')
print(df.head())

But it is throwing this error:

ParserError: Error tokenizing data. C error: Expected 37 fields in line 23, saw 80

I am using python 3.6 with pandas version 0.24.2. Would upgrading pandas help?

Ank
  • 1,864
  • 4
  • 31
  • 51
  • 1
    check the `sep=','` in the txt file, as the error shows line 23 does not follow this separation. – Himanshu Mar 01 '21 at 12:37
  • @Himanshu yes, that's happening because initial rows contain less columns while later rows contain more. I tried using `usecols` argument but it didn't help – Ank Mar 01 '21 at 12:49
  • You should control the unzipped data. Showing here the first rows and the row 23 could help. – Serge Ballesta Mar 01 '21 at 12:50

1 Answers1

0

So this was happening because of irrregular number of columns in various rows, and since I don't want to drop any data, I used the names argument with maximum number of columns to fix the issue like so:

path = 'data_folder/data.2020.ZIP'
df = pd.read_csv(path, compression='zip', header=None, sep=',', names=range(80))
print(df.head())
Ank
  • 1,864
  • 4
  • 31
  • 51