import pandas as pd
import numpy as np
a = list("ABC") * 4
value = np.random.randint(-5, 5, 12)
df = pd.DataFrame({"A": a, "value": value})
print(df)
A value
0 A -5
1 B 1
2 C 0
3 A 0
4 B -4
5 C -1
6 A 2
7 B 4
8 C -5
9 A 0
10 B 1
11 C -1
My goal is to create a third column with quantile labels based on custom quantiles ranges. These ranges are calculated on GroupBy objects based the first column. My use case is to calculate the bottom 10% and the top 10% values (i.e. deciles) for each group and then label them in a new column accordingly: 'bottom_decile', 'mid_deciles', 'top_decile'
Desired outcome:
A value C
0 A -5 bottom_decile
1 B 1 mid_deciles
2 C 0 top_decile
3 A 0 mid_deciles
4 B -4 bottom_decile
5 C -1 mid_deciles
6 A 2 top_decile
7 B 4 top_decile
8 C -5 bottom_decile
9 A 0 mid_deciles
10 B 1 mid_deciles
11 C -1 mid_deciles
This is my attempt:
df['C'] = df.groupby(['A'])['value'].transform(lambda x: pd.qcut(x, [0, 0.1, 0.9, 1], labels=['bottom_decile', 'mid_deciles', 'top_decile']))
However, the result raises an error:
ValueError: Bin edges must be unique: array([-5. , -3.8, 2. , 2. ]).
You can drop duplicate edges by setting the 'duplicates' kwarg