13

I want to split a string containing irregularly repeating delimiter, like method split() does:

>>> ' a b   c  de  '.split()
['a', 'b', 'c', 'de']

However, when I apply split by regular expression, the result is different (empty strings sneak into the resulting list):

>>> re.split('\s+', ' a b   c  de  ')
['', 'a', 'b', 'c', 'de', '']
>>> re.split('\.+', '.a.b...c..de..')
['', 'a', 'b', 'c', 'de', '']

And what I want to see:

>>>some_smart_split_method('.a.b...c..de..')
['a', 'b', 'c', 'de']
Roman
  • 2,225
  • 5
  • 26
  • 55
  • 1
    have also a look at [Why are empty strings returned in split() results?](http://stackoverflow.com/questions/2197451/why-are-empty-strings-returned-in-split-results) – stema Jun 19 '15 at 08:19

2 Answers2

20

The empty strings are just an inevitable result of the regex split (though there is good reasoning as to why that behavior might be desireable). To get rid of them you can call filter on the result.

results = re.split(...)
results = list(filter(None, results))

Note the list() transform is only necessary in Python 3 -- in Python 2 filter() returns a list, while in 3 it returns a filter object.

Community
  • 1
  • 1
Walker
  • 423
  • 3
  • 8
  • Is there a way then to get split with limited number of splits? `>>>split('.a.b...c', 1) ['a', 'b...c'] >>>split('a.b...c', 1) ['a', 'b...c']` – Roman Jun 19 '15 at 08:56
  • 1
    Yes, actually, pretty much exactly as you wrote it. For example, split(regex, string, 1) will stop splitting after the first match. You can see more in the python regex docs, [here](https://docs.python.org/2/library/re.html#re.split). – Walker Jun 19 '15 at 14:11
18
>>> re.findall(r'\S+', ' a b   c  de  ')
['a', 'b', 'c', 'de']
dlask
  • 8,776
  • 1
  • 26
  • 30