Since Tuples are immutable, why does slicing them make a copy instead of a view?

Question

As far as I understand it, tuples and strings are immutable to allow optimizations such as re-using memory that won't change. However, one obvious optimisation, making slices of tuples refer to the same memory as the original tuple, is not included in python.

I know that this optimization isn't included because when I time the following function, time taken goes like O(n^2) instead of O(n), so full copying is taking place:

def test(n):
    tup = tuple(range(n))
    for i in xrange(n):
        tup[0:i]

Is there some behavior of python that would change if this optimization was implemented? Is there some performance benefit to copying even when the original is immutable?

Sometimes the answer is "because no one has taken the time to implement it fully yet". — Ignacio Vazquez-Abrams, Jan 10 '16 at 20:40
It might be a tradeoff between reducing the amount of copying, as you are pointing out, and reducing the number of reference counts to the original tuple so that it can be garbage collected earlier. — cr3, Jan 10 '16 at 20:47
Algorithmic complexity isn't the end-all; I wouldn't be surprised if creating views has a (comparatively) large constant cost. — roippi, Jan 10 '16 at 21:14
Related: [Tuple slicing not returning a new object as opposed to list slicing](https://stackoverflow.com/q/58507216/7851470) — Georgy, Oct 22 '19 at 15:42
Great question! I think it should. In my opinion, garbage collection issue doesn't make much sense because when the whole tuple is copied, it's just a reference copy. — Jaehyun Yeom, Apr 12 '21 at 17:30
It should be okay to implement it by keeping (the original whole tuple, begin_index, length). — Jaehyun Yeom, Apr 12 '21 at 17:47

score 3 · Accepted Answer · answered Jan 10 '16 at 23:55

By view, are you thinking of something equivalent to what numpy does? I'm familiar with how and why numpy does that.

A numpy array is an object with shape and dtype information, plus a data buffer. You can see this information in the __array_interface__ property. A view is a new numpy object, with its own shape attribute, but with a new data buffer pointer that points to someplace in the source buffer. It also has a flag that says "I don't own the buffer". numpy also maintains its own reference count, so the data buffer is not destroyed if the original (owner) array is deleted (and garbage collected).

This use of views can be big time saver, especially with very large arrays (questions about memory errors are common on SO). Views also allow different dtype, so a data buffer can be viewed at 4 byte integers, or 1 bytes characters, etc.

How would this apply to tuples? My guess is that it would require a lot of extra baggage. A tuple consists of a fixed set of object pointers - probably a C array. A view would use the same array, but with its own start and end markers (pointers and/or lengths). What about sharing flags? Garbage collection?

And what's the typical size and use of tuples? A common use of tuples is to pass arguments to a function. My guess is that a majority of tuples in a typical Python run are small - 0, 1 or 2 elements. Slices are allowed, but are they very common? On small tuples or very large ones?

Would there be any unintended consequences to making tuple slices views (in the numpy sense)? The distinction between views and copies is one of the harder things for numpy users to grasp. Since a tuple is supposed to be immutable - that is the pointers in the tuple cannot be changed - it is possible that implementing views would be invisible to users. But still I wonder.

It may make most sense to try this idea on a branch of the PyPy version - unless you really like to get dig into Cpython code. Or as a custom class with Cython.

Since Tuples are immutable, why does slicing them make a copy instead of a view?

1 Answers1

Linked