0

So I'm trying to solve an issue in my code where an additional backslash is added to a substring(s) in my split list once re.split(regex_pattern, str) is used. The problem goes something like this:

In [63]: str = r'/dir/hello\/hell/dir2/hello\end'

In [64]: regex_pattern = '(hello)'

In [65]: a = re.split(regex_pattern, str)

In [66]: a
Out[66]: ['/dir/', 'hello', '\\/hell/dir2/', 'hello', '\\end']

As you can see, Out[66] shows the list as having two substrings with '\\' instead of two with '\'. I know this problem has something to do with how the compiler interprets backslashes, but ultimately cannot figure out why specifically this is happening.

I've also tried making my str variable a raw string, and adding additional '\' to my str variable (up to four '\\\\') where one exists, i.e.

In [63]: str = r'/dir/hello\\/hell/dir2/hello\\end'

This still gives the same output.

I am using Python 2.7 on Ubuntu. Sorry if this is a duplicate, but I couldn't find a question whose answer applies to mine.

Jordan Pagni
  • 454
  • 3
  • 11

1 Answers1

2

This has nothing to with re.split. \ usually defines an escape sequence. To use a literal \ you'll need to double it:

Consider your original string:

In [15]: s = r'/dir/hello\/hell/dir2/hello\end'

In [16]: s
Out[16]: '/dir/hello\\/hell/dir2/hello\\end'

In [17]: len(s)
Out[17]: 31

The extra \ are not counted with len. They only help to specify that the \ does not define any other escape sequence; asides \\ which is also an escape sequence.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139