2

I am trying to use regular expressions in python to find carriage returns \r in a string which have no following linefeed \n, thus being a likely error.

However, my regex allways matches, and I do not know why:

>>> import re
>>> cr = re.compile("\r[^\n]", re.X)
>>> rr= rege.search("bla")
>>> rr
<_sre.SRE_Match object at 0x0000000003793AC0>

How would the correct syntax look like?

aldorado
  • 4,394
  • 10
  • 35
  • 46

2 Answers2

1

You can use a negative lookahead:

\r(?!\n)

In Python:

import re
rx = r'\r(?!\n)'
for match in re.finditer(rx, string):
    # do sth. here
Jan
  • 42,290
  • 8
  • 54
  • 79
1

You're using verbose mode (re.X) and a non-raw string. That means your regex has a literal carriage return and line feed in it. As these are whitespace characters and you're in verbose mode, the carriage return outside a character class is completely ignored. Your regex is effectively r'[^\n]', matching any non-line feed character.

Use a raw string. As others suggested, a negative lookahead would also be better for a "not followed by" assertion:

r'\r(?!\n)'
user2357112
  • 260,549
  • 28
  • 431
  • 505