I have a cross-sectional dataset. I have many variables including "wealth". I have a decile dummy which divides wealth into 10 groups using the following
df['decile'] = df['wealth'].transform(lambda x: pd.qcut(x, 10, labels=False))
My df (very important to point out that some variables have NaNs).
ID wealth age income ... many many variables.. decile
A 10000 30 4000 5
B 10 19 500 1
C 1000000 37 6000 9
D 2842 22 0 4
E 399932 44 NaN 8
F 2344 19 0 4
G 5000 18 0 4
H
I
..
I want to create a summary stat of variables of my choosing for the bottom decile decile=0
and the top decile decile=9
, and display mean
, median
and std
.
desired output
bottom decile top decile difference in means
mean median std mean median std
wealth .. .. .. .. .. .. .. *** (if statistically significant)
age .. .. .. .. .. .. .. **
income .. .. .. .. .. ..
..
..
..
Is there a easy way to do this using python, instead of having to calculate individually?