0

I have a list of points. I want to put an equal number of points in an equal number of bins in a histogram. Ideally it would not be hard coded so in the future I could have uneven amounts of points in any number of bins.

Then I want to find the mean for each bin and plot these values on the same graph.

I should end up with either a scatter plot or histogram of the total data with the specific means plotted on top of that.

I found some code to put equal amounts of data into bins. But I am struggling moving forward.

data = [list of 1103 points] 
b = number of bins

def histedges(data, binss):
    n = len(x)
    return np.interp(np.linspace(0, n, binss + 1),
                     np.arange(n),np.sort(data))

n, bins, patches = plt.hist(dis, histedges(data, b))
  • Does the bin size matter? Do they have to be all of the same size? Let's consier this example, `input = [1, 100, 100, 100, 100, 102, 102, 101, ... larger than 100 numbers]`. You can't actually have bins of equal size because you have to include the first small value; 1; in a bin. Only possibility with this input is to have bins with 1 element each, but then you can't deal with the triple 100 or double 102. Thus, you also need to define what condition you want to apply on the bins width. – Mathieu Oct 01 '19 at 17:37
  • What you're asking for isn't actually a histogram. The purpose of a histogram is to approximate the underlying probability distribution. If each of the bins contains the same amount of points, then the probability of falling into a given bin is going to be a uniform distribution - even if the underlying data is not uniform. – jimijimjim Oct 01 '19 at 17:45
  • 1
    What is the question that you're trying to answer? – jimijimjim Oct 01 '19 at 17:46
  • It needs to be as close to the same size as possible. So if the bins are slightly uneven that is okay. I would assume that a few extra values end up in the last bin. The histogram is not the goal, the mean values for each bin size is the goal. The distribution will be the same but the values within each bin will not be the same. So when I plot the mean values from the different bins it would almost be linear (depending on the data). I am trying to plot these mean values –  Oct 01 '19 at 18:13

0 Answers0