How do I count data in each bin to get the PDF of a specific dataset?

Question

I have an array of 1024x1024 values between 0 and 1 and I want to divide them into bins of size 0.05 (with first bin centered at 0, second centered at 0.05 ecc...) to compute the PDF for my data set. I'm having problems with calculating the number of data in each bin. I've tried using np.histogram but I get an array for counts that has a different size than my actual number of bins.

I've also tried to calculate the PDF with a for loop but I don't think I'm using a correct approach to that as well, I would appreciate any suggestions that may help.

My code at the moment looks something like this:

bins = np.arange(-0.025, 1.05, .05)
bin_width = bins[1]-bins[0]
num_bins = 22
counts, bin_edges = np.histogram(Flux, bins=bins, 
     range=(-0.025, bin_width*num_bins), density=False)
PDF = counts / (np.sum(counts)*Delta_F) #Delta_F = 0.05 is a normalisation factor

Any help would be really much appreciated, thank you. :)

Welcome to [Stack Overflow.](https://stackoverflow.com/ "Stack Overflow"). In order for us to help you, provide a minimal reproducible problem set consisting of sample input, expected output, actual output, and all relevant code necessary to reproduce the example. What you have provided falls short of this goal. Please edit your question to show a minimal reproducible set. See [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") for details. As a minimum please define ```Flux```, ```bind_edges``` . — itprorh66, Nov 17 '21 at 22:06

score 0 · Answer 1 · answered Nov 18 '21 at 08:46

TL;DR : your bins create 22 points and thus 21 intervals for binning thus you should get 21 count values. Everything is as expected.

A short way to get an idea of the PDF could be to use the parameter density = True :

rng = np.random.RandomState(10)  # deterministic random data
Flux = np.hstack((rng.normal(size=1024), rng.normal(size=1024)) )
np.histogram(Flux)

> (array([  6,  24, 110, 253, 410, 517, 406, 223,  70,  29]),
> array([-3.31766905, -2.70658557, -2.09550209, -1.48441861, -0.87333513,
        -0.26225165,  0.34883184,  0.95991532,  1.5709988 ,  2.18208228,
         2.79316576]))

With the bins you declared a reproducible code would look like :

rng = np.random.RandomState(10)  # deterministic random data
Flux = np.hstack((rng.normal(size=1024), rng.normal(size=1024)) )
bins = np.arange(-0.025, 1.05, .05)
bin_width = bins[1]-bins[0]
num_bins = 22
Delta_F = 0.05
counts, bin_edges = np.histogram(Flux, bins=bins, 
     range=(-0.025, bin_width*num_bins), density=False)
PDF = counts / (np.sum(counts)*Delta_F) #Delta_F = 0.05 is a normalisation factor
print(f"length PDF = {PDF.size}, \n PDF = {PDF} \n")
print(f"length bins = {bins.size}, \n bins = {bins}")

> length PDF = 21, 
> PDF = [0.85561497 1.25668449 0.90909091 1.25668449 1.25668449 1.22994652
 1.01604278 1.17647059 0.98930481 0.93582888 1.3368984  0.90909091
 0.93582888 0.82887701 0.82887701 0.80213904 0.80213904 0.72192513
 0.56149733 0.72192513 0.6684492 ] 

> length bins = 22, 
> bins = [-0.025  0.025  0.075  0.125  0.175  0.225  0.275  0.325  0.375  0.425
  0.475  0.525  0.575  0.625  0.675  0.725  0.775  0.825  0.875  0.925
  0.975  1.025]
```

How do I count data in each bin to get the PDF of a specific dataset?

1 Answers1