0

I'm converting a C# function to Python. It's should bug for bug compatible with exist function.

This is a regex in that function: http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;@&=?~#%]*)*. But Python can't compile it:

>>> re.compile(r"http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;@&=?~#%]*)*")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/re.py", line 214, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.3/re.py", line 281, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.3/sre_compile.py", line 498, in compile
    code = _code(p, flags)
  File "/usr/lib/python3.3/sre_compile.py", line 483, in _code
    _compile(code, p.data, flags)
  File "/usr/lib/python3.3/sre_compile.py", line 75, in _compile
    elif _simple(av) and op is not REPEAT:
  File "/usr/lib/python3.3/sre_compile.py", line 362, in _simple
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat

Note: There is a JavaScript version of that regex: /http:\/\/[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z\$\.\+\!\_\*\(\)\/\,\:;@&=\?~#%]*)*/gi.

I searched about nothing to repeat Error, but got nothing. Sorry, this is a duplicate post.

Where is the problem?

000
  • 26,951
  • 10
  • 71
  • 101
比尔盖子
  • 2,693
  • 5
  • 37
  • 53
  • "I searched about nothing to repeat Error, but got nothing." [Really?](https://www.google.ca/search?q=python+re+nothing+to+repeat) Looks like there are plenty of meaningful results, including the two other SO questions found by the other commenters. – Karl Knechtel Apr 11 '13 at 07:30

2 Answers2

10

I've reproduced the error with:

re.compile(r"([A]*)*")

The problem is that [A]* can potentially match an empty string. Guess what happens when it tries to match ([A]*)* when [A]* is empty? "nothing to repeat". The regex engine won't wait around for that to actually happen, though. It fails because it is even remotely possible for the scenario to happen.

This should work for you:

r"http://[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+([-A-Z0-9a-z_$.+!*()/\\\,:;@&=?~#%]*)"

I just removed the last *.

000
  • 26,951
  • 10
  • 71
  • 101
0

Had the same error come up with the following regex:

re.compile(r'(?P<term>[0-9]{1,2})-(?P<features>[A-Za-z\:]*)?')

It was the '?' at the end that caused the error. Strictly-speaking, this is NOT a repeat and, in fact, this works just fine (as it should) as of python 2.7.9. However, the bug was present as of python 2.7.3.

N6151H
  • 150
  • 1
  • 11