Split by regex without resulting empty strings in Python

Question

I want to split a string containing irregularly repeating delimiter, like method split() does:

>>> ' a b   c  de  '.split()
['a', 'b', 'c', 'de']

However, when I apply split by regular expression, the result is different (empty strings sneak into the resulting list):

>>> re.split('\s+', ' a b   c  de  ')
['', 'a', 'b', 'c', 'de', '']
>>> re.split('\.+', '.a.b...c..de..')
['', 'a', 'b', 'c', 'de', '']

And what I want to see:

>>>some_smart_split_method('.a.b...c..de..')
['a', 'b', 'c', 'de']

have also a look at [Why are empty strings returned in split() results?](http://stackoverflow.com/questions/2197451/why-are-empty-strings-returned-in-split-results) — stema, Jun 19 '15 at 08:19

score 20 · Accepted Answer · edited May 23 '17 at 12:22

20

The empty strings are just an inevitable result of the regex split (though there is good reasoning as to why that behavior might be desireable). To get rid of them you can call filter on the result.

results = re.split(...)
results = list(filter(None, results))

Note the list() transform is only necessary in Python 3 -- in Python 2 filter() returns a list, while in 3 it returns a filter object.

edited May 23 '17 at 12:22

Community

answered Jun 19 '15 at 08:16

Walker

Is there a way then to get split with limited number of splits? `>>>split('.a.b...c', 1) ['a', 'b...c'] >>>split('a.b...c', 1) ['a', 'b...c']` – Roman Jun 19 '15 at 08:56
1

Yes, actually, pretty much exactly as you wrote it. For example, split(regex, string, 1) will stop splitting after the first match. You can see more in the python regex docs, [here](https://docs.python.org/2/library/re.html#re.split). – Walker Jun 19 '15 at 14:11

score 18 · Answer 2 · answered Jun 19 '15 at 08:19

18

>>> re.findall(r'\S+', ' a b   c  de  ')
['a', 'b', 'c', 'de']

answered Jun 19 '15 at 08:19

dlask

1

That's awesome solution, a lot of posts are suggesting to use groups instead, but findall does everything. – Vadim Kirilchuk May 20 '19 at 09:02

2 Answers2