2

I want to get the probability to get a value X higher than x_i, which means the cumulative distribution functions CDF. P(X>=x_i). I've tried to do it in Matlab with this code.

Let's assume the data is in the column vector p1.

   xp1 = linspace(min(p1), max(p1));   %range of bins  
   histp1 = histc(p1(:), xp1);      %histogram od data 
   probp1 = histp1/sum(histp1);     %PDF (probability distribution function)  
   `figure;plot(probp1, 'o')  `   

Now I want to calculate the CDF,

   sorncount = flipud(histp1);  
   cumsump1 = cumsum(sorncount);  
   normcumsump1 = cumsump1/max(cumsump1);  
   cdf = flipud(normcumsump1);  
   figure;plot(xp1, cdf, 'ok');  

I'm wondering whether anyone can help me to know if I'm ok or am I doing something wrong?

Gohann
  • 155
  • 1
  • 12
  • 1
    What are the flips for? They seem unnecessary... Also the cumsum should not need to be normalized as you define the pdf to sum to 1. Looks good besides that. – Raab70 May 08 '14 at 02:33
  • flips are for take the sum higher or equal than a certain value p1. Otherwise I think i would give us just the P(X <= x_i), whcich is what a dont looking for. By other side, oyur located a thing that I didint realized. Why should I take the probability instead the histogram to make the cumsum? how should you do this?. Thanks in advance any comment. – Gohann May 09 '14 at 08:42
  • 1
    A histogram is the correct function to use there. A histogram counts the relative frequency of a value, which is essentially synonymous with the probability of it occurring (When normalized of course). The flips make sense in that context. I missed the `X>=x` part. A CDF is traditionally [defined](https://en.wikipedia.org/wiki/Cumulative_distribution_function) the other way `P(X<=x)` – Raab70 May 09 '14 at 13:14
  • Dear Raab70. So, Is it good what I've wrote in my previous code? Thanks my friend. – Gohann May 14 '14 at 09:45

1 Answers1

1

Your code works correctly, but is a bit more complicated than it could be. Since probp1 has been normalized to have sum equal to 1, the maximum of its cumulative sum is guaranteed to be 1, so there is no need to divide by this maximum. This shortens the code a bit:

xp1 = linspace(min(p1), max(p1));   %range of bins  
histp1 = histc(p1(:), xp1);         %count for each bin
probp1 = histp1/sum(histp1);        %PDF (probability distribution function)  
cdf = flipud(cumsum(flipud(histp1)));   %CDF (unconventional, of P(X>=a) kind)

As Raab70 noted, most of the time CDF is understood as P(X<=a), in which case you don't need flipud: taking cumsum(histp1) is all that's needed.

Also, I would probably use histp1(end:-1:1) instead of flipud(histp1), so that the vector is flipped no matter if it's a row or column.