49

I have a formatted string from a log file, which looks like:

>>> a="test                            result"

That is, the test and the result are split by some spaces - it was probably created using formatted string which gave test some constant spacing.

Simple splitting won't do the trick:

>>> a.split(" ")
['test', '', '', '', ... '', '', '', '', '', '', '', '', '', '', '', 'result']

split(DELIMITER, COUNT) cleared some unnecessary values:

>>> a.split(" ",1)
['test', '                           result']

This helped - but of course, I really need:

['test', 'result']

I can use split() followed by map + strip(), but I wondered if there is a more Pythonic way to do it.

Thanks,

Adam

UPDATE: Such a simple solution! Thank you all.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Adam Matan
  • 128,757
  • 147
  • 397
  • 562

6 Answers6

86

Just do not give any delimeter?

>>> a="test                            result"
>>> a.split()
['test', 'result']
Kimvais
  • 38,306
  • 16
  • 108
  • 142
  • 18
    As for why this works: a.split(None) is a special case, which in Python means "split on one or more whitespace chars". re.split() is the general case solution. – Gregg Lind Mar 22 '10 at 13:23
  • 1
    One needs to use str.split(None, maxsplit) since the function does not accept keyword arguments. I wonder why. – tbrittoborges May 28 '15 at 12:53
  • 2
    the question was, how to split with delimiter+ (one or more). You answer is saying any of whitespace will be taken as delimiter, which is not correct answer – Risinek Apr 08 '16 at 10:11
48
>>> import re
>>> a="test                            result"
>>> re.split(" +",a)
['test', 'result']

>>> a.split()
['test', 'result']
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 1
    Cool. Might help with other, none-whitespace delimiters. – Adam Matan Mar 22 '10 at 13:23
  • 1
    re.split('\W+',mystring) is more equivalent string.split(None). – Gregg Lind Mar 22 '10 at 13:27
  • 10
    This is the only answer to the actual request, "split by 1 or more occurrences of a delimiter". – Mark E. Haase Apr 18 '13 at 03:02
  • 1
    this should be accepted Answer.... The other ones are not answering the real question... – Risinek Apr 08 '16 at 10:09
  • `re.split()` gives me an extra token if the string ends with a space. – BarathVutukuri Jun 11 '19 at 08:49
  • @BarathVutukuri that is the correct behavior of a `split` function. If the input sequence ends with a delimiter, there is an empty term after that delimiter. Java's handling of this case is out of the ordinary, where the API documentation specifically says it discards trailing empty terms (but not leading ones) when no term limit is applied. Python, Javascript, C# do not discard trailing terms. – theferrit32 Feb 10 '20 at 21:23
23

Just this should work:

a.split()

Example:

>>> 'a      b'.split(' ')
['a', '', '', '', '', '', 'b']
>>> 'a      b'.split()
['a', 'b']

From the documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
4

Any problem with simple a.split()?

YOU
  • 120,166
  • 34
  • 186
  • 219
  • 2
    the question was, how to split with delimiter+ (one or more). You answer is saying any of whitespace will be taken as delimiter, which is not correct answer – Risinek Apr 08 '16 at 10:10
3

If you want to split by 1 or more occurrences of a delimiter and don't want to just count on the default split() with no parameters happening to match your use case, you can use regex to match the delimiter. The following will use one or more occurrences of . as the delimiter:

s = 'a.b....c......d.ef...g'
sp = re.compile('\.+').split(s)
print(sp)

which gives:

['a', 'b', 'c', 'd', 'ef', 'g']
theferrit32
  • 241
  • 5
  • 7
1

Just adding one more way, more useful in cases where delimiter is different from space, and s.split() will not work.

like str = "Python,is,,more,,,,,flexible".

In [27]: s = "Python,is,,more,,,,,flexible"

In [28]: str_list = list(filter(lambda x: len(x) > 0, s.split(",")))

In [29]: str_list
Out[29]: ['Python', 'is', 'more', 'flexible']
anshu kumar
  • 737
  • 7
  • 5