0

I have a csv file with a custom delimiter as "$&$" like this:

$&$value$&$,$&$type$&$,$&$text$&$
$&$N$&$,$&$N$&$,$&$text of the message$&$
$&$N$&$,$&$F$&$,$&$text of the message_2$&$
$&$B$&$,$&$N$&$,$&$text of the message_3$&$

and I'm not able to parse it with the following code:

df = pd.read_csv('messages.csv', delimiter= '$\&$', engine='python)

can you help me, please??

  • Check out the docs regarding delimiter. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html. Another option, would be read in the file with open() and use string replace to replace '$\&$' with a delimeter that better suits your document maybe '|' – 1extralime Jul 12 '22 at 18:30
  • Can you edit your question and put there more rows from the file? – Andrej Kesely Jul 12 '22 at 18:35
  • these are the three columns of the dataset – Mohmmad Hadi Jul 12 '22 at 18:42

1 Answers1

1

From the docs:

... separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

So, to fix your case it should be like this:

df = pd.read_csv('messages.csv', delimiter= '\$\&\$,\$\&\$|\$\&\$', usecols=[1,2,3])

Note that there are going to be 2 additional columns, the first one and the last one. They exist because all data start/end with $&$. In addition to that, the delimiter is actually $&$,$&$. So, usecols get rid of them.

This is the output from the provided sample:

value type text
N N text of the message
lepsch
  • 8,927
  • 5
  • 24
  • 44
  • I did this and it worked! but there is a problem too! in column text just the first line of the text of that message remained and the other message lines of that row are missed! – Mohmmad Hadi Jul 15 '22 at 12:02