1

I needed a regex pattern to catch any 16 digit string of numbers (each four number group separated by a hyphen) without any number being repeated more than 3 times, with or without hyphens in between.

So the pattern I wrote is

a=re.compile(r'(?!(\d)\-?\1\-?\1\-?\1)(^d{4}\-?\d{4}\-?\d{4}\-?\d{4}$)')

But the example "5133-3367-8912-3456" gets matched even when 3 is repeated 4 times. (What is the problem with the negative lookahead section?)

Vaish MK
  • 281
  • 1
  • 4
  • 11
  • Do you need to use regex at all? It seems like this would be easier to write and understand using `str.split` and a `collections.Counter`. – jonrsharpe Apr 09 '17 at 07:51

1 Answers1

2

Lookaheads only do the check at the position they are at, so in your case at the start of the string. If you want a lookahead to basically check the whole string, if a certain pattern can or can't be matched, you can add .* in front to make go deeper into the string.

In your case, you could change it to r'(?!.*(\d)\-?\1\-?\1\-?\1)(^d{4}\-?\d{4}\-?\d{4}\-?\d{4}$)'.

There is also no need to escape the minus at the position they are at and I would move the lookahead right after the ^. I don't know how well python regexes are optimized, but that way the start of the string anchor is matched first (only 1 valid position) instead of checking the lookahead at any place just to fail the match at ^. This would give r'^(?!.*(\d)-?\1-?\1-?\1)(\d{4}-?\d{4}-?\d{4}-?\d{4}$)'

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37