I have one categorical variable and two numeric cols:
np.random.seed(123)
df = pd.DataFrame({'group' : ['a']*10+['b']*10,
'var1' : np.random.randn(20),
'var2' : np.random.randint(10,size=20)})
I want to find, by group
, the mean of var1
after filtering to constrict df
to the top-quartile var2
values by group. That is, the threshold for each group would be:
thresh = df.groupby('group')['var2'].quantile(0.75)
Here's a loopy demonstration of what I want to end up with:
for group, frame in df.groupby('group'):
print(frame[frame.var2 >= frame.var2.quantile(0.75)].var1.mean())
# -1.4713362407192072
# 0.15512098976530683
The result should be a DataFrame with (group
, var
) columns or a Series indexed by group
. I believe the solution may involve .transform
/.apply
but am stuck there.