I have a pandas dataframe, which for simplicity I'll mark as:
Column1 | Column2 | Column3 |
---|---|---|
0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 | 0 |
And I have a function that transforms the data, for example:
def mutation(df, idx):
df.iloc[idx] += 1
return df
I'd like to hold each possible mutation applied to this dataset in a variable. For example:
var1 =
Column1 | Column2 | Column3 |
---|---|---|
1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 | 0 |
var2 =
Column1 | Column2 | Column3 |
---|---|---|
0 | 0 | 0 |
1 | 0 | 0 |
0 | 0 | 0 |
. . .
var9 =
Column1 | Column2 | Column3 |
---|---|---|
0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 | 1 |
And so on. I will hold (rows x columns)^n different variables (where n is the number of mutations I am applying), where the difference between each one is small. The problem is that my mutation is in-place - they share the same data and one mutation will apply on all variables.
Instead, I can mutate with a deepcopy:
def immutable_mutation(df, idx):
df = df.copy(deep=True)
df.iloc[idx] += 1
return df
The problem is that it creates (rows x columns)^n duplicates of my data instead of "just" holding the initial dataset and it's mutations.
My question is - is there any way to apply these mutations in an immutable way that does not require a deepcopy?
I am willing to migrate to Polars/Spark or any other library for that matter.