0

I've been trying to cleanup some text. But got stuck on regex, finally got around with re.sub. But end up with syntax error. Original Code:

Test for name cleanup

import re

input = u'CHEZ MADU 東久留米店(シェマディ)【東京都東久留米市】'

pattern = re.compile(ur'(【(.*?)\】)', re.UNICODE)\

print(re.sub(input, pattern, ''))

Gave me this error:

  File "retest01.py", line 6
    pattern = re.compile(ur'(【(.*?)\】)', re.UNICODE)\
                                      ^
SyntaxError: invalid syntax

I've been testing code from another regex thread: python regular expression with utf8 issue

It gave same error. What could be possible the source of problem here?

Community
  • 1
  • 1
DatCra
  • 253
  • 4
  • 13

2 Answers2

1

If you don't use the raw string notation, it works out fine for me. Additionally, I don't think you're using the re.sub properly:

re.sub(pattern, repl, string, count=0, flags=0)

This didn't throw an error for me:

import re
input = u'CHEZ MADU 東久留米店(シェマディ)【東京都東久留米市】'
pattern = re.compile(u'(【(.*?)\】)', re.UNICODE)
print(re.sub(pattern, '', input))

This works on python 2 and 3, but you don't need the unicode specifier on 3.

JDong
  • 2,304
  • 3
  • 24
  • 42
0

The ur'....' syntax is invalid since Python 3.3 (see http://bugs.python.org/issue15096 )

The syntax error is, a bit surprisingly, indicated at the end of the string...

>>> ru'my string'
  File "<stdin>", line 1
    ru'my string'
                ^
SyntaxError: invalid syntax

So, in Python 3, you can use either:

  • 'my string' or u'mystring', which mean the same (the latter was reintroduced in Python 3.3 for compatibility with Python 2 code, see PEP 414 )
  • or r'my string with \backslashes' for a "raw" string.
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50