This is my first post, so if there is any improvement to be made or further information that I can give please let me know.
I am using R 3.4.3 on a mac running 10.11.6.
I am working with count data. I am interested in the theta/ dispersion/ size parameter (k) of the negative binomial distribution (my understanding is that these terms are used interchangeably). I am fitting the NB distribution to these data estimating the parameters via maximum likelihood using the fitdistrplus
package, with the fitdist
function. I am interested if different groups of my data are better modelled using different distributions. I therefore fit the distribution to all the data. The data is then split based on one 2 level factor and the distribution fit to these two separate groups.
When I fit the distribution to the whole data set, I get an estimate of mu and size with standard errors. I then split the data. This same approach works fine for half of the data (Group A), but not the other half (Group B), which should, in theory, be structurally the same. Instead I get an estimate of mu and size but with NA for the standard errors.
Imposing lower = c(0,0) and upper = c(inf, inf) for the optim
function behind fitdist
also does not achieve anything, and as the output is NA not NaN I don't think it has anything to do with it trying to estimate negative numbers anyway (error 100 that is often discussed).
And just out of interest I removed all zero's in case it was something to do with that and that did nothing either.
So my question is why are the NA's being produced (and ultimately how do I get standard error's for the estimates)?
Here is my data and my code:
require(fitdistrplus)
data.set <- read.csv("data.set.csv")
count.A <- subset(data.set, category == "A")
count.B <- subset(data.set, category == "B")
# All Data
plotdist(data.set$count, histo = TRUE, demp = TRUE)
count.nb <- fitdist(data.set$count, "nbinom")
plot(count.nb)
LL.nb <- logLik(count.nb)
count.p <- fitdist(data.set$count, "pois")
plot(count.p)
LL.p <- logLik(count.p)
cdfcomp(list(count.p, count.nb),legendtext = c("Poisson", "negative binomial"))
gofstat(list(count.p, count.nb),fitnames = c("Poisson", "negative binomial"))
# Group A
plotdist(count.A$count, histo = TRUE, demp = TRUE)
A.count.nb <- fitdist(count.A$count, "nbinom")
plot(A.count.nb)
A.LL.nb <- logLik(A.count.nb)
A.count.p <- fitdist(count.A$count, "pois")
plot(A.count.p)
A.LL.p <- logLik(A.count.p)
cdfcomp(list(A.count.p, A.count.nb),legendtext = c("Poisson", "negative binomial"))
gofstat(list(A.count.p, A.count.nb),fitnames = c("Poisson", "negative binomial"))
# Group B
plotdist(count.B$count, histo = TRUE, demp = TRUE)
B.count.nb <- fitdist(count.B$count, "nbinom", method = "mle")
plot(B.count.nb)
B.LL.nb <- logLik(B.count.nb)
B.count.p <- fitdist(count.B$count, "pois")
plot(B.count.p)
B.LL.p <- logLik(B.count.p)
cdfcomp(list(B.count.p, B.count.nb),legendtext = c("Poisson", "negative binomial"))
gofstat(list(B.count.p, B.count.nb),fitnames = c("Poisson", "negative binomial"))