4

I am making several regex substitutions in Python along the lines of

  \w\s+\w  

over many large documents. Obviously if I make the regex non-greedy (with a ?) it won't change what it matches (as \w != \s) but will it make the code run any faster? In other words, with non-greedy regexes does Python work its way from the first character matched onwards rather than from the end of the document back to that character, or is this a naive view?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Barry
  • 167
  • 1
  • 10

1 Answers1

2

Is this the pattern you implied?

In [15]: s = 'some text   with \tspaces  between'

In [16]: timeit re.sub(r'(\w)(\s+)(\w)', '\\1 \\3', s)
10000 loops, best of 3: 30.5 us per loop

In [17]: timeit re.sub(r'(\w)(\s+?)(\w)', '\\1 \\3', s)
10000 loops, best of 3: 24.9 us per loop

Seems to be a pretty small difference here. Only 5 microseconds with the non-greedy,

Using a 500 word lorem-ipsum, with multiple mixed whitespace between every word, I get an 8 ms difference.

jdi
  • 90,542
  • 19
  • 167
  • 203