I want to divide my data into quantiles (quintiles to be more specific). However, when using qcut in Pandas or ntile in R, I either get the error that "bin edges must be unique" (qcut) or equal values get assigned to different quintiles (ntile).
Looking at the following data: [1, 8, 2, 1, 8] I would like the ouput to be:
- 1 assigned to quintile 1
- 8 assigned to quintile 5
- 2 assigned to quintile 3
- 1 assigned to quintile 1 again
- 8 assigned to quintile 5 again
In this specific case the 1s could also be assigned to quintile 2 instead of 1 (depending on definition). It is just important that equal values are assigned to the same quintile.
Do you have any ideas on how I can achieve this?
Thank you very much for your input!
Some data to use:
import pandas as pd
df = pd.DataFrame({'Date': ['2011-01-01', '2011-01-02', '2011-01-03','2011-01-01', '2011-01-02',
'2011-01-03','2011-01-01', '2011-01-02', '2011-01-03', '2011-01-01','2011-01-02', '2011-01-03',
'2011-01-01', '2011-01-02', '2011-01-03'], 'Name': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'E'],
'Value':[1, 3, 2, 8, 6, 5, 2, 10, 4, 1, 5, 3, 8, 4, 9]})
Code I used in R:
df = df %>% group_by(Date) %>% mutate(Quint = ntile(Value, 5))
This assigned qual values to different bins.
Code I used in Python:
df['Quintiles'] = df.groupby(by=["Date"])['Value'].transform(lambda x: pd.qcut(x, 5, labels=["XS", "S", "M", "L", "XL"]))
This resulted in the error of bin edges having to be unique.