0

I am trying to compute the cumulative distribution function for a set of values.

I computed the histogram using gsl and I tried to computed the CDF from here, but it seems like the values are shifted by one position.

This is the code I am using:

gHist =  gsl_histogram_alloc((maxRange - minRange) / 5);
gsl_histogram_set_ranges_uniform(gHist, minRange, maxRange);

for (int j = 0; j < ValidDataCount; j++)
gsl_histogram_increment (gHist, ValAdd[j]);

gsl_histogram_pdf * p = gsl_histogram_pdf_alloc(gsl_histogram_bins(gHist));
gsl_histogram_pdf_init (p,  gHist);

for (int j = 0; j < gsl_histogram_bins(gHist) + 1 ; j++)
printf ("%f ", p->sum[j]);

The histogram is like this: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 .... goes on like this. there is a total of 20 values

And the cdf is: 0.00 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.1 0.1 ...

Why is there a 0 on the first position? Shouldn't it start with 0.05?

Thank you.

DCuser
  • 953
  • 2
  • 10
  • 21

1 Answers1

0

GSL alloc sum to be an array of size n+1, where n is the number of bins. However, only n entries are necessary to calculate the pdf. This extra allocation of one element happens because gsl defines sum[0] = 0.

in the GSL source coode "pdf.c" you can see that

 gsl_histogram_pdf *gsl_histogram_pdf_alloc (const size_t n)
 {
   (...)
   p->sum = (double *) malloc ((n + 1) * sizeof (double));
 }


 int  gsl_histogram_pdf_init (gsl_histogram_pdf * p, const gsl_histogram * h)
 {
   (...)
    p->sum[0] = 0;
    for (i = 0; i < n; i++)
    {
     sum += (h->bin[i] / mean) / n;
     p->sum[i + 1] = sum;
    }
 }
Vivian Miranda
  • 2,467
  • 1
  • 17
  • 27