How to remove double quote from a csv file before reading it?

Question

I am getting the following error:

pandas.errors.ParserError: '|' expected after '"'

The reason is because the first line has '"' that shouldn't be there:

"Name|Kind|Color|Price

I tried the following:

`pd.read_csv(filename, sep='|', usecols=fields, engine='python')`

Which produces the above error.

pd.read_csv(filename, sep='|', usecols=fields, engine='python', quotechar='"', error_bad_lines=False)

This doesn't work because it drops the whole line which I need because it's column headers.

Is there a way to fix this without rewriting the file? Maybe read it into a string and remove '"', but then how do I read that string with the following?

pd.read_csv(filename, sep='|', usecols=fields, engine='python')

score 0 · Answer 1 · answered Nov 10 '20 at 20:05

I am not totally sure about your problem but given a csv file like:

"Name|Kind|Color|Price
alex|robot|braun|100$

then the following code will remove any leading "#" if present:

import pandas as pd
import re


pd.DataFrame([
    re.match(r'"*(?P<line>.*)', line)
    .group("line")
    .split("|")
    for line in open("tmp.csv").readlines()
])


# 
#       0      1      2      3
# 0  Name   Kind  Color  Price
# 1  alex  robot  braun   100$

garlic_rat · Answer 2 · 2020-11-10T21:52:21.453

You understand the issue.

Read the first line separately (see Ashwini Chaudhary's method) Once you have the line, remove the double quote and split the line using your separator.

# Initialize your separator and filename
sep = '|'
filename = 'some.csv'

# Read the first line and remove the double quote

with open(filename, newline='') as f:
  reader = csv.reader(f)
  row1 = next(reader)  
  cols = row1.replace('"','').split(sep)

Using the cols list, perform the pandas.read_csv, skipping the first line (no header row) and specify the column names using the cols list you just extracted.

df = pd.read_csv(filename, 
                 sep=sep, 
                 skiprows=1, 
                 header=0, 
                 names=cols, 
                 engine='python')

The read_csv assumes you want to use all the columns defined in the first row by separators. If you want to use only a subset, the cols list will need to be adjusted and use_cols will need to be specified.

How to remove double quote from a csv file before reading it?

2 Answers2