python split without creating blanks

Question

I understand why it is important to create blanks using split thanks to this question, but sometimes it is necessary not to grab them.

lets say you parsed some css and got the following strings:

s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
s2 = 'color:#000;background-color:#fff;border:1px #333 dotted'

both are valid css even though there is a semicolon lacking at the end of the string. when splitting the strings, you get the following:

>>> s1.split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted', '']
>>> s2.split(';')
['color:#000', 'background-color:#fff', 'border:1px #333 dotted']

that extra semicolon creates a blank item in the list. now if I want to manipulate further I would need to test the beginning and end of each list, and remove them if they are blank, which is not that bad, but seems avoidable.

question:

is there a method that is essentially the same as split but does not include trailing blank items? or is there simply a way to remove those just like a string has strip to remove the trailing whitespace

Martijn Pieters · Answer 1 · 2013-07-09T09:11:31.053

13

Simply remove the items with the None filter:

filter(None, s1.split(';'))

Demo:

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> filter(None, s1.split(';'))
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Calling filter() with None removes all 'empty' or numeric 0 items; anything that would evaluate to false in a boolean context.

filter(None, ....) eats list comprehensions for breakfast:

>>> import timeit
>>> timeit.timeit('filter(None, a)', "a = [1, 2, 3, None, 4, 'five', ''] * 100")
9.410392045974731
>>> timeit.timeit('[i for i in a if i]', "a = [1, 2, 3, None, 4, 'five', ''] * 100")
44.9318630695343

edited Jul 09 '13 at 09:11

answered Jul 09 '13 at 07:12

Martijn Pieters

1,048,767
296
4,058
3,343

Isn't filter slower than a list comprehension? (I'm not saying that my answer is better than yours, I'm just wondering) – TerryA Jul 09 '13 at 07:16
although all answers work, this answer is the one that actually answers my question and does not simply make it work for my situation. Thanks. I am going to bed now, but I will accept the answer in the morning – Ryan Saxe Jul 09 '13 at 07:17
@Haidro: `filter()` with `None` is *one* C operation, no Python looping done. I'd say it is faster. :-) – Martijn Pieters Jul 09 '13 at 07:39
I'm not sure about performance, but GvR was thinking about getting rid of it, along with other lispy built-ins like `map` and `reduce`. In Python 3 `reduce` was moved to the `functools` module. I like the lispies but it is not idiomatic Python. – Paulo Scardine Jul 09 '13 at 07:43
@PauloScardine Someone linked this to me the other day about it: http://www.artima.com/weblogs/viewpost.jsp?thread=98196 – TerryA Jul 09 '13 at 07:46
@PauloScardine: Yes, Guido had an aversion of functional programming constructs until others convinced him of their use. `reduce` was the only victim in the end, and it was moved to `functools` rather than be removed altogether. – Martijn Pieters Jul 09 '13 at 07:50
@Haidro: I tested `filter(None, ...)` versus a list comp and the filter was 5 times as fast. – Martijn Pieters Jul 09 '13 at 09:10

score 7 · Answer 2 · answered Jul 09 '13 at 07:12

You can use a list comprehension to filter out the empty strings, as an empty string is considered False:

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> [i for i in s1.split(';') if i]
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Alternatively, you can rstrip() the semicolon first:

>>> s1.rstrip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Ashwini Chaudhary · Answer 3 · 2013-07-09T07:24:43.890

Apply str.strip to the string before doing the split:

>>> s1 = 'background-color:#000;color:#fff;border:1px #ccc dotted;'
...     
>>> s1.strip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

Works for both leading and trailing ';':

>>> s1 = ';background-color:#000;color:#fff;border:1px #ccc dotted;'
>>> s1.strip(';').split(';')
['background-color:#000', 'color:#fff', 'border:1px #ccc dotted']

I am not sure why you would want to avoid this as a strip before split is going to be faster than both LC and filter:

>>> s1 = ';background-color:#000;color:#fff;border:1px #ccc dotted;'*1000
>>> %timeit filter(None, s1.split(';'))
1000 loops, best of 3: 638 us per loop
>>> %timeit s1.strip(';').split(';')
1000 loops, best of 3: 570 us per loop
>>> %timeit [i for i in s1.split(';') if i]
100 loops, best of 3: 931 us per loop

python split without creating blanks

question:

3 Answers3

Linked