1

I'm wondering if there's any benefit to writing this pattern

def feature_eng(df):
  df1 = df.copy()
  ...
  return df1

as opposed to this pattern

def feature_eng(df):
  ...
  return df
Daniel Tan
  • 135
  • 1
  • 2
  • 10
  • 1
    Depends on what happens in `...`. There might be explicit or implicit copying operations and/or (re)assignments in there. – timgeb Nov 10 '20 at 07:46

1 Answers1

1

Say you have a raw dataframe df_raw and you create df_feature using feature_eng. Your second method will overwrite df_raw when calling df_feature = feature_eng(df_raw) while the first method will not. So in case you want to keep df_raw as it is and not modify it, the first pattern will lead to the correct result.

A little example:

def feature_eng1(df):
    df.drop(columns=['INDEX'], inplace=True)
    return df

def feature_eng2(df):
    df1 = df.copy()
    df1.drop(columns=['INDEX'], inplace=True)
    return df1

df_feature = feature_eng1(df_raw)

Here df_raw will not contain the contain the column INDEX while using feature_eng2 it would.

mlang
  • 728
  • 6
  • 15
  • 1
    Great explanation, especially the ‘will overwrite’ statement. My brain hadn’t twigged until I read this. Thanks! – S3DEV Nov 10 '20 at 08:34