1

I have a problem with a certain vector. I'm tying to find out IF it's gamma-distributed and (if so) what the parameters (shape, rate) are. MY vector has 400 entries but lets take e.g.

x <- c(45.94,31.04,17.49,9.81,6.34,4.18,2.93,2.01,1.61,1.27,1.04,0.809)

I read something about fitdistr(). But I didn't quite understand what it actually does! I tried thie following with my real (long) vector:

 fitdistr(x, "gamma")
  shape         rate    
 0.167498708   0.519997226 
(0.008849548) (0.068359517)
Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
2: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
3: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
4: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
5: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
6: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt
7: In densfun(x, parm[1], parm[2], ...) : NaNs wurden erzeugt

What does the output mean? Are these my fitting parameters? I tested them, but the KS-Test gave me a negative result:

> ks.test(anzahl, "pgamma", 0.167498708, 0.519997226)

One-sample Kolmogorov-Smirnov test

data:  anzahl
D = 0.3388, p-value < 2.2e-16
alternative hypothesis: two-sided

So could you maybe tell me how I can find out if my vector is gamma-distributed and what the parameters are?

Flo Chi
  • 65
  • 2
  • 9
  • try plotting a curve with those parameters on your data and see how it looks. you can probably just ignore the warnings, they are common when fitting maximum likelihood without specifying ranges for parameters – Rorschach Aug 05 '15 at 19:49
  • 1
    First off, this might be better posed in http://stats.stackexchange.com/ -- that forum might be better able to address the non-programmatic part of the question. That said, I think the warnings are most likely caused by near-singular behavior around shape=1. If your data has many small values that is likely to be a problem. – user295691 Aug 05 '15 at 20:16
  • I tried it. Looks strange somehow :D The scaling doesn't fit. Is this the right way how to do it? I still don't know how to find out if my vector is gamma-distributed – Flo Chi Aug 05 '15 at 20:53
  • I've just realized, that I might have a scaling problem. When I see a proper curve in my plot, the x-axis goes from 0 to 100 BUT the y-axis goes from 0 to 50 as my first numbers are 46, 31, etc. WHEREAS the gamma distributed numbers go from 0 to 0,1 maybe. So how do I manage the scaling IN ADDITION to my problem?? – Flo Chi Aug 05 '15 at 21:11

2 Answers2

2

Well, I just had the very same trouble with some gamma distributed data I'm handling.

What may happen is that when you call the ks.test () function, the default arguments for a gamma distribution are shape and scale in that order, but you are passing shape and rate instead. Try the following:

ks.test (x, "pgamma", shape=0.167498708, rate=0.519997226)

If that does not help you, give a try to the Kolmogorov-Smirnov test simulation procedure described in Cross validated.

Finally, I must say that if I get your vector x and run fitdistr() I get shape=0.7177 and rate=0.0692, which give KS=0.18302, p-value=0.7527. So there's something wrong with your fitdistr(x, "gamma").

elcortegano
  • 2,444
  • 11
  • 40
  • 58
1

Just take a look at the graph of your data. Since it has only 400 entries you will be better of trying to fit it in MS excel using =gammadist() rather than using R. If your graph resembles a gamma-dist curve (just google the curve image and check) you can then try to fit the data to a gamma curve... The above results of fitdistr() tells you that the best fit to your data is the gamma function curve with parameters alpha = 0.167498708 and beta = 0.519997226. But the ks test is saying that it is a very poor fit. I guess that graph observation will tell you better.

Gaurav
  • 1,597
  • 2
  • 14
  • 31