2

I have a frame with the folowing structure:

df = pd.DataFrame({'ID': np.random.randint(1, 13, size=1000),
                   'VALUE': np.random.randint(0, 300, size=1000)})

How could i plot the graph, where on the X-axis there will be percentiles (10%, 20%,..90%) and on the Y-axis there should be quantity of values, that lies between percentile ticks , for example 20%-30% And ther must be a seperate plot for every ID (and different percentiles values also)

i've found percentiles and stuck q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] df.groupby('ID')['VALUE'].quantile(q)

I guess the plot should look like a histogram for VALUE parameter, but with percentage on X axis instead of numeric values

Denis Ka
  • 137
  • 1
  • 1
  • 10

2 Answers2

2
q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

for name, group in df.groupby('ID'):  # Groupy by ID column
    _, bins = pd.qcut(group.VALUE, q, retbins=True, grid=False)  # Splits data in defined quantiles
    plt.figure()
    group.VALUE.hist(bins=bins)  # Plots histogram of data with specified bins
    ax.set_xticks(q, [f'{str(x) * 100}%' for x in q])  # format ticks (NOT TESTED)   
    plt.show()

Not capturing the output plots here, because they are alot. It produces the plot you want, but you will also need to adapt the ticks and formatting.

To achieve a normalized plot, with y-Axis ranging from 0-100%, you would need to normalize your data before plotting (maybe somehting like group.VALUE.count() / df.VALUE.count()

flurble
  • 1,086
  • 7
  • 21
2

Try:

df['Quantile'] = pd.qcut(df.VALUE, q=np.arange(0,1.1,0.1))
tmp_df = df.pivot_table(index='Quantile', columns='ID', aggfunc='count')
tmp_df.plot(kind='bar', subplots=True, figsize=(10,10))
plt.show()

Output, each subplot is the quantile count for each ID.

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • what is Quantile? In every ID there must be individual values for every percentile – Denis Ka May 15 '19 at 12:27
  • So the difficulty is in grouping by IDs - when i try to qcut on any of them - i get an ValueError "Bin edges must be unique", beacuse of small amount of values in some groups – Denis Ka May 15 '19 at 12:49
  • Review your bins, maybe `bins = sorted(list(set(bins)))`. The other statement is not a problem, it will shows as `Nan` on the grouping and `0` on plot. – Quang Hoang May 15 '19 at 12:58
  • @Quang Hoang...nice one!! – ASH Aug 20 '19 at 01:02