Is it redundant to use df.copy() when writing a function for feature engineering?

Question

I'm wondering if there's any benefit to writing this pattern

def feature_eng(df):
  df1 = df.copy()
  ...
  return df1

as opposed to this pattern

def feature_eng(df):
  ...
  return df

Depends on what happens in `...`. There might be explicit or implicit copying operations and/or (re)assignments in there. — timgeb, Nov 10 '20 at 07:46

score 1 · Accepted Answer · answered Nov 10 '20 at 07:42

Say you have a raw dataframe df_raw and you create df_feature using feature_eng. Your second method will overwrite df_raw when calling df_feature = feature_eng(df_raw) while the first method will not. So in case you want to keep df_raw as it is and not modify it, the first pattern will lead to the correct result.

A little example:

def feature_eng1(df):
    df.drop(columns=['INDEX'], inplace=True)
    return df

def feature_eng2(df):
    df1 = df.copy()
    df1.drop(columns=['INDEX'], inplace=True)
    return df1

df_feature = feature_eng1(df_raw)

Here df_raw will not contain the contain the column INDEX while using feature_eng2 it would.

Great explanation, especially the ‘will overwrite’ statement. My brain hadn’t twigged until I read this. Thanks! — S3DEV, Nov 10 '20 at 08:34

Is it redundant to use df.copy() when writing a function for feature engineering?

1 Answers1