2

I'm fighting too long on this regex now. The split should use blank as separator but maintain the remaining ones in a blank sequence to the next token

'123 45   678    123.0'
=>
'123', '45', '  678', '   123.0'

My numbers are floats as well and the group count is unknown.

Mike Pennington
  • 41,899
  • 19
  • 136
  • 174
Wolfgang R.
  • 333
  • 1
  • 2
  • 10

1 Answers1

2

What about using a lookbehind assertion?:

>>> import re
>>> regex = re.compile(r'(?<=[^\s])\s')
>>> regex.split('this  is a   string')
['this', ' is', 'a', '  string']

regex breakdown:

(?<=...)  #lookbehind.  Only match if the `...` matches before hand
[^\s]     #Anything that isn't whitespace
\s        #single whitespace character

In english, this translates to "match a single whitespace character if it isn't preceded by a whitespace character."

Or you can use a negative lookbehind assertion:

regex = re.compile(r'(?<!\s)\s')

which might be slightly nicer (as suggested in the comments), and should be relatively easy to figure out how it works since it is very similar to the above.

mgilson
  • 300,191
  • 65
  • 633
  • 696