I'm wondering if there's any benefit to writing this pattern
def feature_eng(df):
df1 = df.copy()
...
return df1
as opposed to this pattern
def feature_eng(df):
...
return df
I'm wondering if there's any benefit to writing this pattern
def feature_eng(df):
df1 = df.copy()
...
return df1
as opposed to this pattern
def feature_eng(df):
...
return df
Say you have a raw dataframe df_raw
and you create df_feature
using feature_eng
. Your second method will overwrite df_raw
when calling df_feature = feature_eng(df_raw)
while the first method will not. So in case you want to keep df_raw
as it is and not modify it, the first pattern will lead to the correct result.
A little example:
def feature_eng1(df):
df.drop(columns=['INDEX'], inplace=True)
return df
def feature_eng2(df):
df1 = df.copy()
df1.drop(columns=['INDEX'], inplace=True)
return df1
df_feature = feature_eng1(df_raw)
Here df_raw will not contain the contain the column INDEX
while using feature_eng2
it would.