3

I was wondering if I get some advice about fitting hurdle models using continuous data and covariates. I have some continuous data that are generally well fit using a right-skewed distribution such as a Pareto, Gamma, or Weibull distribution. However, there several zeros in my data which are important to my analysis. In addition, I have some categorical (two-level) covariates and would like to model the parameters of a distribution as a function of these covariates in order to formally evaluate their importance (e.g., using AIC). I have seen examples of hurdle models fit using continuous data but have not yet found any examples of how to incorporate covariates and a model-selection framework. Does anyone have any suggestions as to how to proceed or know of any R packages that allow this procedure? I have included some code below to reproduce the type of data I am working with. The non-zero data are generated via a generalized Pareto distribution from the package texmex. The parameters were estimated directly from my non-zero data. I have also included the code to plot the data in a histogram to see their distribution.

library("texmex")
set.seed(101)
zeros <- rep(0,8)
non_zeros <- rgpd(17, sigm=exp(-10.4856), xi=0.1030, u = 0)
all.data <- c(zeros,non_zeros)
hist(non_zeros,breaks=50,xlim=c(0,0.00015),ylim=c(0,9),main="",xlab="",
     col="gray")
hist(zeros,add=TRUE,col="black",breaks=100,xlim=c(0,0.00015),ylim=c(0,9))
legend("topright",legend=c("zeros"),col="black",lwd=8)

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
JBauder
  • 91
  • 1
  • 5
  • 1
    This is a good question, but I think it may be more suitable to [CrossValidated](http://stats.stackexchange.com) depending on whether you are looking for packages (itself a gray area as to whether it merits closure on SO) or statistical approaches. Can you list/explain the examples you have seen and specifically how they fall short of your needs? http://comments.gmane.org/gmane.comp.lang.r.ecology/2124 – Ben Bolker Apr 07 '15 at 15:26
  • cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf – IRTFM Apr 07 '15 at 15:39
  • @BondedDust, I don't think (although admittedly I haven't double-checked) that the countreg vignette will do it; the OP is asking about a zero-inflated *continuous* distribution, most of the stuff in pscl is for zero-inflated or -altered *discrete* distributions. – Ben Bolker Apr 07 '15 at 16:28
  • Good point. I suppose one could try a `quasi-poisson` family to sidestep the requirement of integer data, but probably better to extend the search to tobit or other models. This presentation offers a discussion of different approaches to the problem: http://people.musc.edu/~nab42/CAC%20Analysis.pdf – IRTFM Apr 07 '15 at 17:02
  • @BenBolker is correct, pscl just covers hurdle models for discrete count data. In addition to tobit models which use a single latent Gaussian variable to drive both zeros and non-zeros, one could use a probit model (glm) plus a truncated Gaussian model (trucreg), similar to the count hurdle. With 23 observations this might be asking a lot, though. – Achim Zeileis Apr 08 '15 at 06:12
  • 1
    Thanks everyone for your suggestions. I guess I am asking about both the statistical methods for fitting hurdle-models with continuous data and covariates, as well as any packages that are available. All examples I had seen previously dealt with count data (i.e., ZI Poisson). I will look into the tobit models although I am not sure how a truncated Gaussian would handle such right-skewed data. Any suggestions? Thanks again! – JBauder Apr 08 '15 at 17:17
  • Here's a similar question over on the stats site, it seems to be a better fit there: https://stats.stackexchange.com/q/187824/3601 – Aaron left Stack Overflow Jun 18 '18 at 23:43

0 Answers0