0

histogram barplot and cumulative histogram curve

I am looking for a python function to get a cumulative curve of frequency with regularly spaced frequence (y axis) and not values (x axis). On this image, the sampling of the dots is regularly spaced for x axis, I would like it to be regular for y axis.

The output of the function would be the regular percentiles, from 0 to 100 by step of n, and the values corresponding to those percentiles.

It would correspond to scipy.stats.cumfreq but with numbins corresponding to y axis (frequencies or percent) and not x axis (values).

This function is a draft of what I am looking for:

def cumfreq_even_freq(array, nbins):
    array = array.flatten()
    array.sort()
    step = len(array)/nbins
    percents = [(i*step * step)/len(array) for i in range(nbins)]
    values = [array[i*step +step] for i in range(nbins)]
    return percents, values

1 Answers1

1

A very rough version, you can use pandas' qcut:

# toy data
np.random.seed(1)
a = np.random.rand(100)

# Quantile cut into 10 bins
cuts = (pd.qcut(a, np.arange(0,1,0.1))    # change arange to your liking
          .value_counts().cumsum()
       ) 

plt.plot([a.right for a in cuts.index], cuts, marker='s')

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74