0

I want to match some digits preceded by a non-digit or at the start of the string.

As the caret has no special meaning inside brackets I can't use that one, so I checked the reference and discovered the alternate form \A.

However, when I try to use it I get an error:

>>> s = '123'
>>> re.findall('[\D\A]\d+', s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: internal: unsupported set operator

What am I doing wrong?

Christoph Wurm
  • 1,072
  • 1
  • 14
  • 31
  • You can't just use the caret outside the brackets, like so? `^[A-Za-z]+?` Also it's not strictly true about carets having no special meaning inside brackets. If a caret is the first character inside brackets, it negates the set of characters inside (says match all except `[^...] ` – Joel Cornett Mar 22 '12 at 16:33
  • 2
    "some digits preceded by a non-digit or at the start of the string" - doesn't that mean, all digits? Just use `\d+`... – Izkata Mar 22 '12 at 16:36
  • @lzkata: The real use case is more complicated. This is just a simplification. – Christoph Wurm Mar 22 '12 at 19:37
  • i actuly have almost the same problem http://stackoverflow.com/questions/16257370/im-trying-to-get-proxies-using-regex-python-out-of-a-web-page – Teli Kaufman Apr 27 '13 at 22:11

2 Answers2

2

You can use a negative lookbehind:

(?<!\d)\d+

Your problem is that you are using \A (a zero width assertion) in a character class, which is for matching a single character. You could write it like (?:\D|\A) instead, but a lookbehind is nicer.

Qtax
  • 33,241
  • 9
  • 83
  • 121
0

Repetition in regular expressions is greedy by default, so using re.findall() with the regex \d+ will get you exactly what you want:

re.findall(r'\d+', s)

As a side note, you should be using raw strings when writing regular expressions to make sure the backslashes are interpreted properly.

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 1
    In this simplified use case, yes. In the real use case where matching the start of the string is really necessary this is not possible. Thanks for the raw strings hint. – Christoph Wurm Mar 22 '12 at 19:39