python regex: use first blank as sep but maintain rest of blank sequence

Question

I'm fighting too long on this regex now. The split should use blank as separator but maintain the remaining ones in a blank sequence to the next token

'123 45   678    123.0'
=>
'123', '45', '  678', '   123.0'

My numbers are floats as well and the group count is unknown.

mgilson · Accepted Answer · 2012-11-23T15:52:47.063

2

What about using a lookbehind assertion?:

>>> import re
>>> regex = re.compile(r'(?<=[^\s])\s')
>>> regex.split('this  is a   string')
['this', ' is', 'a', '  string']

regex breakdown:

(?<=...)  #lookbehind.  Only match if the `...` matches before hand
[^\s]     #Anything that isn't whitespace
\s        #single whitespace character

In english, this translates to "match a single whitespace character if it isn't preceded by a whitespace character."

Or you can use a negative lookbehind assertion:

regex = re.compile(r'(?<!\s)\s')

which might be slightly nicer (as suggested in the comments), and should be relatively easy to figure out how it works since it is very similar to the above.

edited Nov 23 '12 at 15:52

answered Nov 23 '12 at 15:44

mgilson

300,191
65
633
696

2

You can use `(?<!\s)` to simplify – Diego Nov 23 '12 at 15:50
@Diego -- Thanks, I've added that alternative as well – mgilson Nov 23 '12 at 15:54

python regex: use first blank as sep but maintain rest of blank sequence

1 Answers1