2

I know the shallow copy behaviour of list

list1 = [1, 2, 3, 4]
list2 = copy(list1) # shallow copy of list1

list1 is list2 # False as expected
False

list1[0] is list2[0] # True as expected
True

list1[0] = 10 # this will change list1 but not list2

# the result is as expected
list1
[10, 2, 3, 4]
list2
[1, 2, 3, 4]

I have tried similar operations with pandas series. The behaviour is different.

series1 = pd.Series([1, 2, 3, 4])
series2 = series1.copy(deep=False) # shallow copy

series1 is series2 # as expected
False

series1[0] is series2[0] # Q1: I was expecting True
False

# Q2: I was expecting the change in series1 but not in series2
series1[0] = 10
series1
0    10
1     2
2     3
3     4
dtype: int64

series2
0    10
1     2
2     3
3     4
dtype: int64


Question 1: Why series1[0] and series2[0] is not the same?

Question 2: Why changing first element of series1 also changed series2 even though they are different as per Question 1?

Why the behaviour is not consistent for list and series? Is there any special reason for this?

cryptomanic
  • 5,986
  • 3
  • 18
  • 30
  • `Why the behaviour is not consistent for list and series` - `list` is a Python object, `Series` is a Pandas object - apparently the Pandas Series designers desired the behaviour they gave it. – wwii Feb 11 '20 at 18:32
  • `Why series1[0] and series2[0] is not the same?` - Are you expecting it to be the same object or to have the same value? – wwii Feb 11 '20 at 18:33
  • `series1.values is series2.values` -> `True` – wwii Feb 11 '20 at 18:38
  • @wwii I was expecting the same object – cryptomanic Feb 11 '20 at 18:41
  • @wwii well, they are *both python objects* – juanpa.arrivillaga Feb 11 '20 at 18:56
  • 1
    @cryptomanic the key issue is that is **never the case** for `pandas.Series` objects (unless you use dtype=object). – juanpa.arrivillaga Feb 11 '20 at 19:01
  • @juanpa.arrivillaga - I guess I meant that a list is a Python object in the sense that it is part of the language, it can be found in the Python [standard type hierarchy](https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy) whereas a Pandas Series object is not part of the (core?) language and was designed by (probably) different people for a different purpose. You're tryin' to trick me - ?? – wwii Feb 11 '20 at 19:12

1 Answers1

0

The fundamental issue here is that pandas is essentially a wrapper around a numpy.ndarray, which, for arrays which use actual numeric dtypes instead of dtype=object, then the primitive values need to be "boxed" every time you index. So note, this is the behavior without copying at all:

>>> import pandas as pd
>>> s = pd.Series([1])
>>> s[0] is s[0]
False

Note, the behavior becomes consistent with dtype=object:

>>> s = pd.Series([1], dtype=object)
>>> s[0] is s[0]
True
>>> import copy
>>> copy.copy(s)[0] is s[0]
True
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172