I have a large dataframe df
(~100 columns and ~7 million rows) and I need to create ~50 new variables / columns which are simple transformations of the current variables. One way to proceed would be with many .apply
statements (I'm just using transform*
as a placeholder for simple transformations such as max
or squaring):
df['new_var1'] = df['old_var1'].apply(lambda x : transform1(x))
...
df['new_var50'] = df['old_var50'].apply(lambda x : transform50(x))
Another way would be to first create a dictionary
transform_dict = {
'new_var1' : lambda row : transform1(row),
...,
'new_var50' : lambda row : transform50(row)
}
and then write one .apply
combined with .concat
:
df = pd.concat([df,
df.apply(lambda r: pd.Series({var : transform_dict[var](r) for var in transform_dict.keys()}), axis=1)], axis=1)
Is one method preferred over the other, either in how 'Pythonic' it is, or efficiency, scalability, flexibility?