10

I'm trying to read a csv file with pandas.

This file actually has only one row but it causes an error whenever I try to read it.

Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

I do like:

with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:

df = pd.read_csv(file, header=None, sep="\t")
df

Then I get:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

I don't get what's really going on, so any of your advice will be appreciated.

peterh
  • 11,875
  • 18
  • 85
  • 108
user9191983
  • 505
  • 1
  • 4
  • 20

3 Answers3

14

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')
Hietsh Kumar
  • 1,197
  • 9
  • 17
7

Try df = pd.read_csv(file, header=None, error_bad_lines=False)

Po Xin
  • 113
  • 5
  • 1
    Thanks so much fo your comment Po Xin, I've tried that and got another error like this `ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.` – user9191983 Nov 12 '18 at 04:53
  • try this https://stackoverflow.com/questions/33998740/error-in-reading-a-csv-file-in-pandascparsererror-error-tokenizing-data-c-err – Po Xin Nov 12 '18 at 04:59
  • How to avoid showing errors in terminal furthermore? – M. Mariscal Feb 19 '20 at 09:28
3

The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:

delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))

Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.

Adam Zeldin
  • 898
  • 4
  • 6