0

I have a data frame with an observed variable and the date stamp [Var1, DD, MM, YYYY] that runs over thousands of rows. I need to fit a distribution [exponential or gamma] for each year's observed variable and get relevant parameters for each year.

In Matlab, it would just be

 j=1
 k=1

 for i=1:(no_of_rows-1)

    if  year(i+1) = year(i)
        temp_data_year(j) = Var1(i)
        j=j+1

    else  [a,b]= gamfit(temp)
         param(:,:,k) = [a,b]
         k=k+1
   endif

 end   

So I will get the parameters for every year in the data in variable param.

So is there something in R that can do this?

Thanks,

  • 3
    Look at `?by`, put your year variable in the `INDICES` parameter and your distribution fitter (`?MASS::fitdistr`) in the `FUN` parameter. Good luck. – Stephan Kolassa Jul 18 '14 at 13:25

1 Answers1

0

Like this.

# creates a sample dataset - you have this already
set.seed(1)             # for reproducible example
df <- data.frame(var1=c(rgamma(365,2,4),rgamma(365,3,5),rgamma(365,1,8)),
                 YYYY=rep(2012:2014,each=365))

# you start here...
library(fitdistrplus)   # for fitdist(...)
aggregate(var1~YYYY,df,function(X)fitdist(X,distr="gamma")$estimate)
#   YYYY var1.shape var1.rate
# 1 2012   1.891706  3.873906
# 2 2013   2.812962  4.778191
# 3 2014   1.031067  7.826776

Read the documentation on fitdist(...) in the fitdistrplus package. There are several fitting algorithms available.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • It worked. But I get few warning messages "In densfun(x,...):NaNs produced". Should I worry about it? – user2527808 Jul 18 '14 at 16:39
  • Not necessarily. It may be that the starting parameter estimates (chosen by `fitdist(...)`) were too far from the MLEs. You need to plot the distribution functions against the histograms of your data to verify that the fit is good. Also, you can specify starting estimates using the `start=...' argument. Read the documentation. – jlhoward Jul 18 '14 at 23:52