7

Why does the function below change the global DataFrame named df? Shouldn't it just change a local df within the function, but not the global df?

import pandas as pd

df = pd.DataFrame()

def adding_var_inside_function(df):
    df['value'] = 0

print(df.columns) # Index([], dtype='object')
adding_var_inside_function(df)
print(df.columns) # Index([u'value'], dtype='object')
Michael
  • 13,244
  • 23
  • 67
  • 115
  • 2
    Read [this](http://pandas.pydata.org/pandas-docs/stable/overview.html#mutability-and-copying-of-data). If you need an independent copy of your DF use this method: `df2 = df.copy()` – MaxU - stand with Ukraine Jun 15 '16 at 21:51
  • 1
    So adding a column is one of the only things that can change the global DataFrame??? Seems like mutability should be an all or nothing affair for a particular type of object absent an explicit declaration. They don't even list out the other attributes that are mutable! Nevertheless, thanks for the explanation. – Michael Jun 15 '16 at 21:55

1 Answers1

10

from docs:

Mutability and copying of data

All pandas data structures are value-mutable (the values they contain can be altered) but not always size-mutable. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data untouched. In general, though, we like to favor immutability where sensible.

Here is another example, showing values (cell's) mutability:

In [21]: df
Out[21]:
   a  b  c
0  3  2  0
1  3  3  1
2  4  0  0
3  2  3  2
4  0  4  4

In [22]: df2 = df

In [23]: df2.loc[0, 'a'] = 100

In [24]: df
Out[24]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

df2 is a reference to df

In [28]: id(df) == id(df2)
Out[28]: True

Your function, that won't mutate the argument DF:

def adding_var_inside_function(df):
    df = df.copy()
    df['value'] = 0
    return df

In [30]: df
Out[30]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

In [31]: adding_var_inside_function(df)
Out[31]:
     a  b  c  value
0  100  2  0      0
1    3  3  1      0
2    4  0  0      0
3    2  3  2      0
4    0  4  4      0

In [32]: df
Out[32]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419