I have the following dataframe:
import pandas as pd
import numpy as np
base_df = pd.DataFrame({
'id': [1,2,3],
'base_value': [100, None, 123.1]
})
id base_value
1 100.0
2 NaN
3 123.1
There's also another one:
extended_df = pd.DataFrame({
'id': [1, 1, 2, 2, 3],
'ext_id': [100, 500, 90, 1, 1000],
'role_1': [True, False, False, True, True],
'role_2': [False, True, True, False, False],
'ext_value': [10, 5, 21, 200, 500]
})
id ext_id role_1 role_2 ext_value
1 100 True False 10
1 500 False True 5
2 90 False True 21
2 1 True False 200
3 1000 True False 10
I would like to get the following dataframe as a result:
result_df = pd.DataFrame({
'id': [1,2,3],
'base_value': [100, None, 123.1],
'ext_value_role_1': [10, 200, 500],
'ext_value_role_2' :[5, 21, None]
})
id base_value ext_value_role_1 ext_value_role_2
1 100.0 10 5.0
2 NaN 200 21.0
3 123.1 500 NaN
The logic behind the transformation is as follows. For each row in base_df
find matches in extended_df
. The resulting dataframe will contain as many additional columns (excluding id
, base_value
) as there are role_*
columns times number of ext_value
columns in extended_df
. If id
's role is set to False
the resulting value in column is None
.
I've come up with the code posted below. It works for the small example but I'm hoping to get something more pandas
esque.
def multiply_by_boolean(boolean_value, variable):
if boolean_value is False:
return None
elif boolean_value is None:
raise ValueError('boolean_value cannot be None')
return variable
extended_df['ext_value_role_1'] = extended_df.apply(lambda x: multiply_by_boolean(x['role_1'], x['ext_value']), axis=1)
extended_df['ext_value_role_2'] = extended_df.apply(lambda x: multiply_by_boolean(x['role_2'], x['ext_value']), axis=1)
res_df = extended_df[['id', 'ext_value_role_1', 'ext_value_role_2']]
res_df = res_df.groupby('id').agg('max')
res_df = res_df.merge(right=base_df, left_index=True, right_on='id')