How to parse a csv file with a custom delimiter

Question

I have a csv file with a custom delimiter as "$&$" like this:

$&$value$&$,$&$type$&$,$&$text$&$
$&$N$&$,$&$N$&$,$&$text of the message$&$
$&$N$&$,$&$F$&$,$&$text of the message_2$&$
$&$B$&$,$&$N$&$,$&$text of the message_3$&$

and I'm not able to parse it with the following code:

df = pd.read_csv('messages.csv', delimiter= '$\&$', engine='python)

can you help me, please??

Check out the docs regarding delimiter. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html. Another option, would be read in the file with open() and use string replace to replace '$\&$' with a delimeter that better suits your document maybe '|' — 1extralime, Jul 12 '22 at 18:30
Can you edit your question and put there more rows from the file? — Andrej Kesely, Jul 12 '22 at 18:35

lepsch · Answer 1 · 2022-07-12T19:48:35.160

From the docs:

... separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

So, to fix your case it should be like this:

df = pd.read_csv('messages.csv', delimiter= '\$\&\$,\$\&\$|\$\&\$', usecols=[1,2,3])

Note that there are going to be 2 additional columns, the first one and the last one. They exist because all data start/end with $&$ . In addition to that, the delimiter is actually $&$,$&$. So, usecols get rid of them.

This is the output from the provided sample:

value	type	text
N	N	text of the message

I did this and it worked! but there is a problem too! in column text just the first line of the text of that message remained and the other message lines of that row are missed! — Mohmmad Hadi, Jul 15 '22 at 12:02

How to parse a csv file with a custom delimiter

1 Answers1