I have a DataFrame as follows.
d = {}
d['Model'] = ['M1', 'M2'] * 4 * 3 * 5
d['Support'] = ['S1', 'S1', 'S2', 'S2'] * 2 * 3 * 5
d['Group'] = ['G1', 'G2', 'G2'] * 2 * 4 * 5
d['Case'] = ['C11', 'C21', 'C22', 'C31', 'C32'] * 2 * 4 * 3
val1 = []
val2 = []
random.seed(0)
for i in range (2*4*3*5):
val1.append(random.randrange(-10, 11))
val2.append(random.randrange(-10, 11))
d['val1'] = val1
d['val2'] = val2
df = pd.DataFrame(d)
I am looking for the maximum value of the sum of the positive values for each Support, Group and Case and with reference to the Model and the sum of the minimum values of the sum of the negative values.
This is my attempt:
df1 = df.groupby(['Model', 'Support', 'Group', 'Case'])[['val1', 'val2']].sum()
df2 = df1.groupby(['Model', 'Support', 'Group', 'Case']).agg([
('max' , lambda x : x[x > 0].sum()),
('min' , lambda x : x[x < 0].sum())
])
df3 = df2.groupby(['Support', 'Group', 'Case']).agg([
('max' , max),
('Model', lambda x: x.idxmax()[0]),
('min' , min),
('Model', lambda x: x.idxmin()[0]),
])
(See edition of 14 Jan 2021)
The results of the last DataFrame df3
are fine, but this is not the output format I want.
I need to filter the last DataFrame df3
to get the results this way:
Edited 14 Jan 2021
Reviewing the @aneroid results I have seen that the df3 values are not as expected.
For Support S1
, Group G1
, Case C11
this will be the result:
df_M1S1G1_1 = df.loc[df['Model'] == 'M1']
df_M1S1G1_2 = df_M1S1G1_1.loc[df['Support'] == 'S1']
df_M1S1G1_3 = df_M1S1G1_2.loc[df['Group'] == 'G1']
Model Support Group Case val1 val2
0 M1 S1 G1 C11 2 3
12 M1 S1 G1 C22 -8 0
24 M1 S1 G1 C32 5 0
36 M1 S1 G1 C21 7 -4
48 M1 S1 G1 C31 -6 -9
60 M1 S1 G1 C11 10 0
72 M1 S1 G1 C22 10 -4
84 M1 S1 G1 C32 -8 -3
96 M1 S1 G1 C21 -5 0
108 M1 S1 G1 C31 -9 7
df_M2S1G1_1 = df.loc[df['Model'] == 'M2']
df_M2S1G1_2 = df_M2S1G1_1.loc[df['Support'] == 'S1']
df_M2S1G1_3 = df_M2S1G1_2.loc[df['Group'] == 'G1']
Model Support Group Case val1 val2
9 M2 S1 G1 C32 -2 7
21 M2 S1 G1 C21 -10 -8
33 M2 S1 G1 C31 -1 7
45 M2 S1 G1 C11 9 -2
57 M2 S1 G1 C22 -8 0
69 M2 S1 G1 C32 10 7
81 M2 S1 G1 C21 -10 7
93 M2 S1 G1 C31 5 8
105 M2 S1 G1 C11 -6 7
117 M2 S1 G1 C22 -8 -10
Therefore:
0 M1 S1 G1 C11 val1 = 2 val2 = 3
60 M1 S1 G1 C11 val1 = 10 val2 = 0
val1_sum_pos = 12
val1_sum_neg = 0
val2_sum_pos = 3
val2_sum_neg = 0
45 M2 S1 G1 C11 val1 = 9 val2 = -2
105 M2 S1 G1 C11 val1 = -6 val2 = 7
val1_sum_pos = 9
val1_sum_neg = -6
val2_sum_pos = 7
val2_sum_neg = -2
And as a result:
val1
val2
max Model_max min Model_min max Model_max min Model_min
Support Group Case
S1 G1 C11 12 M1 -6 M2 7 M2 -2
M2
These results are in line with the solution proposed by @aneroid.