How can I fix "Error tokenizing data" on pandas csv reader?

Question

I'm trying to read a csv file with pandas.

This file actually has only one row but it causes an error whenever I try to read it.

Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

I do like:

with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:

df = pd.read_csv(file, header=None, sep="\t")
df

Then I get:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

I don't get what's really going on, so any of your advice will be appreciated.

score 14 · Answer 1 · answered Jun 16 '20 at 13:54

14

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')

answered Jun 16 '20 at 13:54

Hietsh Kumar

1,197
9
17

score 7 · Answer 2 · answered Nov 12 '18 at 04:50

7

Try df = pd.read_csv(file, header=None, error_bad_lines=False)

answered Nov 12 '18 at 04:50

Po Xin

113
5

1

Thanks so much fo your comment Po Xin, I've tried that and got another error like this `ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.` – user9191983 Nov 12 '18 at 04:53
try this https://stackoverflow.com/questions/33998740/error-in-reading-a-csv-file-in-pandascparsererror-error-tokenizing-data-c-err – Po Xin Nov 12 '18 at 04:59
How to avoid showing errors in terminal furthermore? – M. Mariscal Feb 19 '20 at 09:28

score 3 · Answer 3 · answered Apr 05 '19 at 18:30

The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:

delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))

Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.

How can I fix "Error tokenizing data" on pandas csv reader?

3 Answers3

Linked