14

In Python (2 and 3). Whenever we use list slicing it returns a new object, e.g.:

l1 = [1,2,3,4]
print(id(l1))
l2 = l1[:]
print(id(l2))

Output

>>> 140344378384464
>>> 140344378387272

If the same thing is repeated with tuple, the same object is returned, e.g.:

t1 = (1,2,3,4)
t2 = t1[:]
print(id(t1))
print(id(t2))

Output

>>> 140344379214896
>>> 140344379214896

It would be great if someone can shed some light on why this is happening, throughout my Python experience I was under the impression empty slice returns a new object.

My understanding is that it's returning the same object as tuples are immutable and there's no point of creating a new copy of it. But again, it's not mentioned in the documents anywhere.

wim
  • 338,267
  • 99
  • 616
  • 750
Vijay Jangir
  • 584
  • 3
  • 15
  • 3
    Related: [Since Tuples are immutable, why does slicing them make a copy instead of a view?](https://stackoverflow.com/q/34710510/7851470) – Georgy Oct 22 '19 at 15:40
  • `l2 = tuple(iter(l1))` bypasses the optimisation – Chris_Rands Oct 22 '19 at 15:47
  • Noticed the [c-api for `PyTuple_GetSlice`](https://docs.python.org/3/c-api/tuple.html#c.PyTuple_GetSlice) was documented inaccurately after seeing your question. Docs have now been fixed (this was [bpo issue38557](https://bugs.python.org/issue38557)). – wim Oct 26 '19 at 21:20

4 Answers4

15

Implementations are free to return identical instances for immutable types (in CPython, you may sometimes see similar optimizations for strings and integers). Since the object can not be changed, there is nothing in user code that needs to care whether it holds a unique instance or just another reference to an existing instance.

You can find the short-circuit in the C code here.

static PyObject*
tuplesubscript(PyTupleObject* self, PyObject* item)
{
    ... /* note: irrelevant parts snipped out */
    if (start == 0 && step == 1 &&
                 slicelength == PyTuple_GET_SIZE(self) &&
                 PyTuple_CheckExact(self)) {
            Py_INCREF(self);          /* <--- increase reference count */
            return (PyObject *)self;  /* <--- return another pointer to same */
        }
    ...

This is an implementation detail, note that pypy does not do the same.

wim
  • 338,267
  • 99
  • 616
  • 750
  • Thanks @wim. This makes sense now. Just one thing off the topic as I'm not experienced in C. What exactly does a->ob_item do? I tried looking up for it. but all i could understand is it takes the address of "a" and moves it "ob_item" forward. My understanding was ob_item holds the number of storage address that makes "1" item. #offTheTopic – Vijay Jangir Oct 22 '19 at 16:19
  • 2
    It might help to look at the typedef for tuple, [here](https://github.com/python/cpython/blob/3.8/Include/cpython/tupleobject.h#L14). So `a->ob_item` is like `(*a).ob_item`, i.e. it gets the member called `ob_item` from the `PyTupleObject` that a is pointing to, and the + ilow then advances to the beginning of the slice. – wim Oct 22 '19 at 17:08
3

It's an implementation detail. Because lists are mutable, l1[:] must create a copy, because you wouldn't expect changes to l2 to affect l1.

Since a tuple is immutable, though, there's nothing you can do to t2 that would affect t1 in any visible way, so the compiler is free (but not required) to use the same object for t1 and t1[:].

chepner
  • 497,756
  • 71
  • 530
  • 681
1

In Python 3.* my_list[:] is syntactic sugar for type(my_list).__getitem__(mylist, slice_object) where: slice_object is a slice object built from my_list's attributes (length) and the expression [:]. Objects that behave this way are called subscriptable in the Python data model see here. For lists and tuples __getitem__ is a built-in method.

In CPython, and for lists and tuples, __getitem__ is interpreted by the bytecode operation BINARY_SUBSCR which is implemented for tuples in here and for lists in here.

In case of tuples, walking through the code you will see that in this code block, static PyObject* tuplesubscript(PyTupleObject* self, PyObject* item) will return a reference to the same PyTupleObject that it got as input argument, if item is of type PySlice and the slice evaluates to the whole tuple.

    static PyObject*
    tuplesubscript(PyTupleObject* self, PyObject* item)
    {
        /* checks if item is an index */ 
        if (PyIndex_Check(item)) { 
            ...
        }
        /* else it is a slice */ 
        else if (PySlice_Check(item)) { 
            ...
        /* unpacks the slice into start, stop and step */ 
        if (PySlice_Unpack(item, &start, &stop, &step) < 0) { 
            return NULL;
        }
       ...
        }
        /* if we start at 0, step by 1 and end by the end of the tuple then !! look down */
        else if (start == 0 && step == 1 &&
                 slicelength == PyTuple_GET_SIZE(self) && 
                 PyTuple_CheckExact(self)) {
            Py_INCREF(self); /* increase the reference count for the tuple */
            return (PyObject *)self; /* and return a reference to the same tuple. */
        ...
}

Now you examine the code for static PyObject * list_subscript(PyListObject* self, PyObject* item) and see for yourself that whatever the slice, a new list object is always returned.

Fakher Mokadem
  • 1,059
  • 1
  • 8
  • 22
  • 1
    Note that this is [different in 2.7](https://docs.python.org/2/reference/datamodel.html#object.__getslice__), where a `start:stop` slice on built-in type, including `tup[:]`, does not go via `BINARY_SUBSCR`. The extended slicing `start:stop:step` does go through subscription, though. – wim Oct 24 '19 at 21:55
  • Okay, thanks will update to specify python version. – Fakher Mokadem Oct 25 '19 at 07:40
0

Not sure about this but it seems that Python provides you with a new pointer to the same object to avoid copying since the tuples are identical (and since the object is a tuple, it's immutable).

michotross
  • 364
  • 3
  • 12