0

To fit a generalized linear model (GLM), is it possible and/or necessary, to specify additional parameters (such as lambda for a Poisson distribution)? Currently, I am using R and stats::glm()).
Or would glm() try and find the best e.g. lambda - in this case, is there a way to extract these calculated distribution parameters from the model fit?

Trying to understand these models and how to choose the best distribution according to type of data and "look" of the distrbution, I realized when I compare my response variable to the theoretical distributions, this is obviously quite dependent on such additional distribution parameters.

(Edit: Is a poisson distribution appropriate at all? Meanwhile I read further - should one even think of zero inflated models?)

real data and Poisson models

blue: real data of response variable
green: Poisson distribution with theoretically correct lambda of 0.8, but visually does not resemble the real (blue) data well in my opinion (on the left side, where most counts are located) red: Poisson distributions with lower lambdas - actually look closer to the blue real data to me (onthe left side, however not at the right side where very few data points are located)...?

Edit: added code
The essential parts of the code I use is

hist(y, col="blue", xlim = c(0,10), breaks=0:10 - 0.1)
hist(rpois(n = 10000, lambda = mean(y, na.rm = T)), col="green", xlim = c(0,10), breaks=0:10 - 0.1)
hist(rpois(n = 10000, lambda = 0.6), col="red", xlim = c(0,10), breaks=0:10 - 0.1)
hist(rpois(n = 10000, lambda = 0.4), col="red", xlim = c(0,10), breaks=0:10 - 0.1)

The model I want to fit for y (a clinical outcome scale, integer 0-6) is

model <- glm(formula = y ~ logical.intervention + logical.diagnostictest + numeric.duration + integer.severity, family = poisson)

Background: We want to find out what influences the outcome most (and suspect that the diagnostic test has little if any meaning meaning in addition to an obvious disease, which all investigated persons have).
(Bonus question: of course we can at best show that the test is not significant to reject the null, whereas other variables really drive the outcome. Is there a better approach for this kind of "inverse" question?)

Martin
  • 594
  • 5
  • 16
  • can you please show us your code/some of the models you're fitting? Do you have covariates? – Ben Bolker Mar 12 '21 at 23:22
  • @BenBolker Thanks for looking at this Q. I added code andexplained more about the model/covariates. Although the main question is if it is possible/necessary to specify distribution parameters. – Martin Mar 13 '21 at 01:07

1 Answers1

0

It turns out that specifying these parameters is neither possible (see indirectly also https://stackoverflow.com/a/46939437/3414968) nor necessary, since they can be calculated from the input data, e.g. for Poisson distribution: lambda = E(X), for Gamma distribution from E(X) = k * theta and Var(X) = k * theta^2.

Martin
  • 594
  • 5
  • 16