I just learned that MASS::fitdistr
, when fitting a negative binomial, is sensitive to the number of zeroes... a bummer since I was hoping to fit this distribution to count data of species where the number of zeroes is unknown and I'd argue unknowable. My goal here is to fit a negative binomial given only the positive (nonzero) part of the distribution... and have confidence that on simulated data, it would return (approximately) the simulated parameter values. I'm not dedicated to using MASS::fitdistr
. Thanks for any suggestions.
# function to fit neg binomial to abundances of species at the per-site level
nbpar <- function(ab){
MASS::fitdistr(ab, densfun = "Negative Binomial"
, lower=c(1e-9, 1e-9))}
# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)
# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters
# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!
UPDATE: For me, a fitting algorithm would work if it returned (in a fairly accurate, low-bias way) parameters that fit the complete data (including 0s), based only on the truncated (strictly positive) data. An expanded example of why MASS::fitdistr
isn't what I want:
# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level
nbpar <- function(ab){
MASS::fitdistr(ab, densfun = "Negative Binomial"
, lower=c(1e-9, 1e-9))}
trunc<-function(x){x[x>0]}
# simulate an abundance vector
set.seed(100)
# slightly more abstract than first example
trials<-667
size = 0.4
mu = 30
site_abundance <-rnbinom(n = trials, size = size, mu = mu)
# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters
# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters
##############
##############
# new stuff
# I suspected the parameters drift upwards
# if I do this iteratively, and a quick test
# showed I was right
drift <- data.frame()
for(driftSteps in c(1:40)){
mypar <- nbpar(trunc(site_abundance))
size <- mypar$estimate[[1]]
mu <- mypar$estimate[[2]]
site_abundance <- rnbinom(n = trials, size = size, mu = mu)
drift[driftSteps,"driftSteps"]<- driftSteps
drift[driftSteps,"size"]<- size
drift[driftSteps,"mu"]<- mu
}
drift %>% ggplot(aes(driftSteps, mu)) +
geom_point() +
theme_classic()