3

I am trying to fit this data to a weibull distribution:

My x and y variables are:

y <- c(1, 1, 1, 4, 7, 20, 7, 14, 19, 15, 18, 3, 4, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1)
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)

The plot looks like this:

enter image description here

I am looking for something like this: fitted plot

and I want to fit a weibull curve to it. I am using the nls function in R like this:

nls(y ~ ((a/b) * ((x/b)^(a-1)) * exp(- (x/b)^a)))

This function always throws up an error saying:

Error in numericDeriv(form[[3L]], names(ind), env) : 
  Missing value or an infinity produced when evaluating the model
In addition: Warning message:
In nls(y ~ ((a/b) * ((x/b)^(a - 1)) * exp(-(x/b)^a))) :
  No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'.
Consider specifying 'start' or using a selfStart model

So first I tried different starting values without any success. I cannot understand how to make a "good" guess at the starting values. Then I went with the SSweibull(x, Asym, Drop, lrc, pwr) function which is a selfStart function. Now the SSWeibull function expects values for Asym,Drop,lrc and pwr and I don't have any clue as to what those values might be.

I would appreciate if someone could help me figure out how to proceed.

Background of the data: I have taken some data from bugzilla and my "y" variable is number of bugs reported in a particular month and "x" variable is the month number after release.

Aditya Bhatia
  • 35
  • 1
  • 5
  • Perhaps this post on Cross Validated would help: http://stats.stackexchange.com/questions/19866/how-to-fit-a-weibull-distribution-to-input-data-containing-zeroes – MrFlick Oct 13 '15 at 21:51
  • I did go through that post before asking this question but didn't find it useful because I have no problem using the fitdistr function. And fitdistr is giving the best distribution fit, not the best curve fit. – Aditya Bhatia Oct 13 '15 at 21:54
  • 1
    _hint_: a probability density function integrates to 1. Is this true of a curve fitted from your data? – Hong Ooi Oct 13 '15 at 22:00
  • You seem to have some problems with the statistics behind what you're doing which makes me thing this belongs on [stats.se]. First, Weibull is a univariate, continuous distribution. The fact that you have integers and that those integers occur more than once is problematic for such a distribution. Also, you're trying to fit values in the count scale to the probability density scale which isn't ever going to match up. You should consider finding the MLE for the shape and scale parameters as described in the other answer. – MrFlick Oct 13 '15 at 22:03
  • I am not trying to fit values in count scale to the probability density scale.. I have used MLE and it does not help me. I will edit the question to show you what I am looking for. – Aditya Bhatia Oct 13 '15 at 22:16

1 Answers1

4

You might consider modifying your formula to better fit the data. For example, you could add an intercept (since your data flatlines at 1 instead of 0, which the model wants to do) and a scalar multiplier since you aren't actually fitting a density.

It is always worth spending some time really thinking about what parameters make sense, because model fitting procedures are often quite sensitive to initial estimates. You can also do a grid-search where you come up with ranges of possible parameters and try fitting the model with the various combinations using error catching functions. nls2 has an option to do this for you.

So, for example,

## Put the data in a data.frame
dat <- data.frame(x=x, y=y)

## Try some possible parameter combinations
library(nls2)
pars <- expand.grid(a=seq(0.1, 100, len=10),
                    b=seq(0.1, 100, len=10),
                    c=1,
                    d=seq(10, 100, len=10))

## brute-force them
## note the model has changed slightly
res <- nls2(y ~ d*((a/b) * ((x/b)^(a-1)) * exp(- (x/b)^a)) + c, data=dat,
           start=pars, algorithm='brute-force')

## use the results with another algorithm
res1 <- nls(y ~ d*((a/b) * ((x/b)^(a-1)) * exp(- (x/b)^a)) + c, data=dat,
           start=as.list(coef(res)))

## See how it looks
plot(dat, col="steelblue", pch=16)
points(dat$x, predict(res), col="salmon", type="l", lwd=2)

enter image description here

Not perfect, but it's a start.

Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • This seems really nice. Thanks! I didn't have any clue about nls2. So to improve on your method..I would have to probably use more parameters? Or do you have any better suggestion? – Aditya Bhatia Oct 14 '15 at 01:16
  • I would play with the actual model formulation. You could also look into maximum likelihood estimation. – Rorschach Oct 14 '15 at 13:45
  • I looked into fitdistr for maximum likelihood estimation but I couldn't find the correct way to use it. – Aditya Bhatia Oct 15 '15 at 00:28
  • `fitdistr` uses MLE by default I believe, `nls` doesn't. I like `mle2` from `bbmle` but there are other functions. – Rorschach Oct 15 '15 at 00:33
  • Yeah. fitdistr uses MLE; I know that. But the output from the fitdistr function was very strange. I used something fitdistr(y,"weibull") but I am making some mistake in this because the fitted distribution didn't make any sense. – Aditya Bhatia Oct 15 '15 at 00:39
  • @AdityaBhatia it looks reasonable to me. Consider, (using `fitdist` from `fitdistrplus` - it has more options and self-starts well), `fit <- fitdist(y, 'weibull'); plot(density(y)); points((x=seq(0, 25, length.out=100)), dweibull(x, coef(fit)[[1]], coef(fit)[[2]]), type="l", col="red")` – Rorschach Oct 15 '15 at 00:50