How to read bad lines in csv files using Pandas in Python?

Question

The csv file has the following structure:

a,b,c
a,b,c,d,e,f,g
a,b,c,d
a,b,c

if I use file = pd.read_csv('Desktop/export.csv',delimiter=','), it will throw a tokenizing error like this: pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 3, saw 10

I do NOT want to skip bad lines. I want to read the csv with all columns and create a dataframe that looks like:

unnamed column1, unnamed column2, ....... unnamed column 7
a,b,c
a,b,c,d,e,f,g
a,b,c,d
a,b,c

How can I load the bad lines in the csv files?

Re https://stackoverflow.com/q/75242879 ``drop database `b'MavenFuzzyFactory'`;``. Just enclose the identifier that has not normally allowed characters in backticks. — ysth, Jan 27 '23 at 04:03
if that doesn't work, likely there are other characters you aren't seeing in the name; do `select SCHEMA_NAME,hex(SCHEMA_NAME) from information_schema.SCHEMATA;` to see what they might be. the name you report would just have 62274D6176656E46757A7A79466163746F727927 — ysth, Jan 27 '23 at 05:34

score 0 · Answer 1 · answered Jan 17 '23 at 02:07

0

You can use the error_bad_lines set to false.

import pandas as pd

file = pd.read_csv('Desktop/export.csv', delimiter=',',error_bad_lines=False)

answered Jan 17 '23 at 02:07

iohans

838
1
7
15

How to read bad lines in csv files using Pandas in Python?

1 Answers1