-1

I am writing a code to read data from google sheets using gspread module.

First I read the spreadsheet and store values in a variable called df. Afterwards, I create a variable called df2 from df to make some transformations (string to numeric), while keeping df (the original database intact ). However this transformation made in df2 is carried to df (original variable where I store the original database). This should not behave like that, the change sould occur only in df2.

Does anyone know why this is happening?

Pls see the code below:

import gspread
import pandas as pd

sa = gspread.service_account(filename = "keys.json") 
sheet = sa.open("chupacabra") 
worksheet = sheet.worksheet("vaca_loca")

df = pd.DataFrame(worksheet.get("B2:I101"))

df

[df loaded](https://i.stack.imgur.com/lV3GJ.png)

df2 = df

df2["Taxa"] = df2["Taxa"].str.replace(",",".")
df2["Taxa"] = df2["Taxa"].str.replace("%","")
df2["Taxa"] = pd.to_numeric(df2["Taxa"])
df2["Taxa"] = df2["Taxa"]/100

df2

[df2 after string transformation](https://i.stack.imgur.com/cFWOg.png)

df 

[df carrying the transformation changes made in df2](https://i.stack.imgur.com/KsSsa.png)

I was trying to perform only transformation in df2, while df should remain intact.

  • Although I'm not sure whether I could correctly understand your situation, I proposed a modification point. Please confirm it. If I misunderstood your question and that was not useful, I apologize. – Tanaike Jan 07 '23 at 01:49
  • The reason is that the variables `df` and `df2` only hold *'object references'* (Also known as pointers). That is, the address of the dataframe object's memory location; the variables tell python where in memory the object's data is stored. `df2 = df` doesn't copy the object, it only copies the memory address, which means that both variables end up pointing to the same object in main memory. As per @Tanaike's comment and answer, if you want to copy the object, you have to call `copy()` on it. This applies to all "mutable" objects (objects whose data you can change). – MatBailie Jan 07 '23 at 02:02

1 Answers1

1

In your script, I'm worried that the reason for your issue might be due to the call by reference. If my understanding is correct, how about the following modification?

From:

df2 = df

To:

df2 = df.copy()
  • By this modification, df is copied as the pass-by-value.
Tanaike
  • 181,128
  • 11
  • 97
  • 165