4

I have a string in the following format;

s="part1,part2,part3,part4"

I can split the string into pieces by just invoking the s.split(",") command.

Now, the question is what if I have a backslash escaped comma in the string? Assuming I have the following string,

s="part1,part2,pa\\,rt3,part4"

I'd like to be able to get ["part1","part2","pa,rt3","part4"] as the result.

What I initially thought was to replace the \, with a non-existent string, then split the string by using the split command and replace the non-existent string with a comma.

Can you think of a better way to deal with this problem?

Jeff Swensen
  • 3,513
  • 28
  • 52
Utku Zihnioglu
  • 4,714
  • 3
  • 38
  • 50
  • 1
    This looks like a problem for Regex, although you might have two problems now. – wheaties Feb 12 '11 at 01:17
  • Do you also have to deal with backslash-escaped backslashes? – dan04 Feb 12 '11 at 01:41
  • @dan04: Only the comma is fine. As that is the delimeter of a part. However, if a part ends with a backslash, it might cause problems which is acceptable in this particular. So no need to deal with backslash-escaped backslashes. – Utku Zihnioglu Feb 12 '11 at 01:48

3 Answers3

11

Replacing it with a non-existing string is a nice option.

And otherwise, you could use a regular expression with a negative lookbehind like this:

re.split(r'(?<!\\),', 'part1,part2,pa\\,rt3,part4')
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • 2
    So this regex says: look back, if there is no \ character, then split. That is exactly what I am looking for! Thanks. – Utku Zihnioglu Feb 12 '11 at 01:22
  • 1
    @funktku: exactly. In most regular expression implementations you have negative and positive lookbehind and lookahead assertions. The positive version obviously requires the string to be available and the negative accepts everything except that string. – Wolph Feb 12 '11 at 01:23
  • Good example for negative lookbehind assertion `(?<!...)`. Thanks for the knowledge. – Senthil Kumaran Feb 12 '11 at 01:37
4

The csv module can handle this as well:

import csv
from io import StringIO

s = 'part1,part2,pa\\,rt3,part4'
f = StringIO(s)

r = csv.reader(f,quoting=csv.QUOTE_NONE,escapechar='\\')
for row in r:
    print row

Output

['part1', 'part2', 'pa,rt3', 'part4']
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
-1

BTW, '\' is not an escape character for ',' comma. So your string would have have a legal word with '\'. If you specially want the \, to be part of the word, then a regex based solutions looks good to me.

Senthil Kumaran
  • 54,681
  • 14
  • 94
  • 131
  • 1
    You are mistaken. The page you linked to even gives the following syntax: escapeseq ::= "\" It's true that '\,' is not an escape sequence that has meaning to Python string formatting, but that's not part of the OP's question. For CSV, you do escape the comma with a backslash when comma is also the field separator. – Michael Kent Feb 12 '11 at 17:25