1

I have large txt file with multiple words and chars and I'm trying to read this file into a pandas dataframe, with each word or char in a different row.

The problem is that " is one of the chars, and the function reads all the words between two " as a single word (because of the quoting).

How can I address this char as another regular char and not as a quoting char? I tried to play with the parameters of the read_csv function but couldn't manage to fix it.

My code now: data = pd.read_csv(filepath, header=None, delimiter = "\t")

Thanks in advance!

abc123
  • 53
  • 5

2 Answers2

0

you can use the parameter quotechar

data = pd.read_csv("a.txt", delim_whitespace=True, header=None,quotechar="~")
print(data.head())

a.txt

abc def xyz
"abc xyz" def

Output

      0     1    2
0   abc   def  xyz
1  "abc  xyz"  def

there are qoutes left this way.

0

Try via numpy's genfromtxt() method:

import numpy as np

data=np.genfromtxt('data.csv',dtype='str',delimeter='\t',skip_header=1)

columns=np.genfromtxt('data.csv',dtype='str',delimiter='\t',skip_footer=len(data))

Finally:

df=pd.Dataframe(data=data,columns=columns)
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41