8

I have a file that looks like:

'colA'|'colB'
'word"A'|'A'
'word'B'|'B'

I want to use pd.read_csv('input.csv',sep='|', quotechar="'") but I get the following output:

colA    colB
word"A   A
wordB'   B

The last row is not correct, it should be word'B B. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
A. Jameel
  • 81
  • 1
  • 1
  • 2

2 Answers2

6

I think you need str.strip with apply:

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
5

The source of the problem is that ' is defined as quote, and as a regular char.

You can escape it e.g.

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>
Yaron
  • 10,166
  • 9
  • 45
  • 65