Modifying one dataframe appears to change another

Question

I am new to loop in Python and just came across a weird question. I was doing some calculations on multiple dataframes, and to simplify the question, here is an illustration.

Suppose I have 3 dataframes filled with NaN:

# generate NaN entries
data = np.empty((15, 10)) 
# create dataframe
data[:] = np.nan
dfnan = pd.DataFrame(data)
df1 = dfnan
df2 = dfnan
df3 = dfnan

After this step, all the three dataframes give me NaN as expected.

But then, if I add two for loops in one block like below:

for i in range(0, 15, 1):
    df1.iloc[i] = 0

for j in range(0, 15, 1):
    df2.iloc[j] = df1.iloc[j].transform(lambda x: x+1)

Then all of df1, df2, and df3 give me 1 entries. But shouldn't it be that:

df1 filled with 0, df2 filled with 1 and df3 filled with NaN (since I didn't make any change to it)?

Why is that and how I can change it to get the wanted result?

Try printing all 3 variables after the first for loop? What is the output? — Code-Apprentice, Oct 26 '21 at 22:01

score 1 · Accepted Answer · answered Oct 26 '21 at 22:35

1

Assignment never copies in python. df1, df2, df3 and dfnan are all references to the same object (pd.DataFrame(data)). This means that changes in one are reflected in the remaining ones, as they all point to the same object.

This is a great reading https://nedbatchelder.com/text/names.html.

To create independent copies use the copy method

dfnan = pd.DataFrame(data)
df1 = dfnan.copy()
df2 = dfnan.copy()
df3 = dfnan.copy()

answered Oct 26 '21 at 22:35

Rodalm

5,169
5
21

1

Yes I understand now. Thank you! – Lazer Oct 26 '21 at 22:55

Modifying one dataframe appears to change another

1 Answers1

Linked