4

I was expecting an empty list but I got:

assert 1 == "".split(/\s+/).size()

and

assert 0 == "".split().size()

Alexander Suraphel
  • 10,103
  • 10
  • 55
  • 90

2 Answers2

2

Maybe you should use tokenize() instead?

assert "".tokenize().size() == 0
assert "foo bar".tokenize() == ['foo', 'bar']
Dónal
  • 185,044
  • 174
  • 569
  • 824
2

I just found out that Java and Python's String.split() follows this pattern as well. Checkout http://docs.python.org/2/library/stdtypes.html#str.split.

The SO question When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']? contains must read answers as well.

The top voted answer explains:

The [str.split()][1] method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.

In contrast, the second mode (with an argument such as \n) will produce the first empty field. Consider if you had written '\n'.split('\n'), you would get two fields (one split, gives you two halves).

It makes sense with the example below of splitting a CSV data

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
,,,USA
'''
>>> for line in data.splitlines():
        print(line.split(','))

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['', '', '', 'USA']

Unless ''(empty string) is considered an actual value, you wouldn't get the result ['', '', '', 'USA'] but ['USA'] instead which is not what you expect.

Alexander Suraphel
  • 10,103
  • 10
  • 55
  • 90