Space complexity of split() function in python

Question

I have a question if whether the following code is executed in place or has extra space complexity. Given that sentence was a string initially. Thanks appreciate the help

sentence = "hello world"

sentence = sentence.split()

The source is a string, the result is a list. How could this be executed in-place? — GPhilo, Oct 22 '19 at 11:30
@GPhilo The substrings in the list could use the same backing array as the original immutable string. — tobias_k, Oct 22 '19 at 11:31
python strings are basically immutable, there is no action that happens "in place" on them, whatever you do will always create another string (or in your case, another object entirely - a list) — Ofer Sadan, Oct 22 '19 at 11:32
No definite answer, but after splitting a very, very long string (~5GB) the memory-comsumption of my interactive Python session (IPython 5.5 using Python 3.6.8) about doubled. — tobias_k, Oct 22 '19 at 11:36

score 5 · Accepted Answer · answered Oct 22 '19 at 11:42

5

In python strings are immutable objects, which means that they cannot change at all "in-place". All actions on them essentially take up new memory space, and hopefully, the old unused ones are deleted by python's garbage collecting process (if there are no more references to those objects). One method to see it for yourself is this:

>>> a = 'hello world'
>>> id(a)
1838856511920
>>> b = a
>>> id(b)
1838856511920
>>> a += '!'
>>> id(a)
1838856512944
>>> id(b)
1838856511920

As you can see, when b and a are referring to the same underlying objects, their id in memory is the same, but as soon as one of them changes, it now has a new id - a new space in memory. The object that was left unchanged (b) still has the same place-id.

To check it in your example:

>>> sentence = "hello world"
>>> id(sentence)
1838856521584
>>> sentence = sentence.split()
>>> id(sentence)
1838853280840

We can once again see that those objects are not taking the same memory. We can further explore just how much space they take up:

>>> import sys
>>> sentence = "hello world"
>>> sys.getsizeof(sentence)
60
>>> sentence = sentence.split()
>>> sys.getsizeof(sentence)
160

answered Oct 22 '19 at 11:42

Ofer Sadan

11,391
5
38
62

While this is all true, I thing you are taking the question too literally, or focussing to much on the "in-place" wording. – tobias_k Oct 22 '19 at 11:47
@tobias_k you might be right which is why I added the size comparison at the end, because maybe that's the real important factor for him. Either way, your experiment demonstrates both aspects as well – Ofer Sadan Oct 22 '19 at 11:49
I don't think the use of `sys.getsizeof` is correct here, as it does not reflect the actual additionally needed memory. See my example. – tobias_k Oct 22 '19 at 12:01
`sys.getsizeof` doesn't prove anything by itself, that's true, but it is a measure of how much memory space is taken, regadless of the reason. It isn't exact because the implementation of `__sizeof__ ` in the actual objects is the result being shown here. Your actual tests of size in memory are better in that regard, I agree – Ofer Sadan Oct 22 '19 at 12:05

tobias_k · Answer 2 · 2019-10-22T12:16:36.543

As noted in comments, the operation can not be "in-place", as that would mean within the same data structure, but you are abviously creating a new data structure (a list) from the string. I will assume that your actual question was whether the substrings returned by split will use the same backing array of characters as the original immutable string.¹⁾

A quick experiment seems to suggest that they do not.

In [1]: s = (("A" * 100000) + " ") * 50000

In [2]: len(s)
Out[2]: 5000050000

In [3]: l = s.split()

After the first step, top shows that the ipython process uses ~30% of my memory, and after the split it uses ~60%, so the backing array, taking up the bulk of the memory, is not reused. Of course, this may be implementation specific. I was using IPython 5.5.0 (based on Python 3.6.8), but get the same result with Python 2.7.15, too. This also seems to apply to string slicing.

¹⁾ Precisely because the strings are immutable this would be possible, and to the best of my knowledge other languages, like Java, do this, although I can currently not test it.)

Note: The use of sys.getsizeof is a bit misleading here, as that seems to measures only the size of the actual data structure, not the elements contained therein.

In [4]: sys.getsizeof(s)
Out[4]: 5000050049

In [5]: sys.getsizeof(l)
Out[5]: 433816

According to that, the list takes up only a fraction of the space of the original splitted string, but as noted above, the actual memory consumption doubled.

I don't think this can be implementation specific because the definition of strings in python is immutable objects. Other than that, good demo! — Ofer Sadan, Oct 22 '19 at 11:50
@OferSadan But precisely _because_ they are immutable they _could_ in fact share the same backing array (and I would have assumed them to do so). Not sure, but I think Java e.g. does that. — tobias_k, Oct 22 '19 at 11:51
cpython `str.split()` ends up calling the C function [`split_whitespace`](https://github.com/python/cpython/blob/3.8/Objects/stringlib/split.h#L54). which has an optimisation where the same backing is indeed used if the string wasn't split, but otherwise always allocates new strings for each span — Sam Mason, Oct 22 '19 at 12:47

Space complexity of split() function in python

2 Answers2