-1

My R-script produces glm() coeffs below. What is Poisson's lambda, then? It should be ~3.0 since that's what I used to create the distribution.

Call:
glm(formula = h_counts ~ ., family = poisson(link = log), data = pois_ideal_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-22.726  -12.726   -8.624    6.405   18.515  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  8.222532   0.015100  544.53   <2e-16 ***
h_mids      -0.363560   0.004393  -82.75   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 11451.0  on 10  degrees of freedom
Residual deviance:  1975.5  on  9  degrees of freedom
AIC: 2059

Number of Fisher Scoring iterations: 5


random_pois = rpois(10000,3)
h=hist(random_pois, breaks = 10)
mean(random_pois) #verifying that the mean is close to 3.
h_mids = h$mids 
h_counts = h$counts  
pois_ideal_data <- data.frame(h_mids, h_counts) 
pois_ideal_model <- glm(h_counts ~ ., pois_ideal_data, family=poisson(link=log))
summary_ideal=summary(pois_ideal_model)
summary_ideal
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
sermolin
  • 161
  • 1
  • 2
  • 6

2 Answers2

2

What are you doing here???!!! You used a glm to fit a distribution???

Well, it is not impossible to do so, but it is done via this:

set.seed(0)
x <- rpois(10000,3)
fit <- glm(x ~ 1, family = poisson())

i.e., we fit data with an intercept-only regression model.

fit$fitted[1]
# 3.005

This is the same as:

mean(x)
# 3.005
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
2

It looks like you're trying to do a Poisson fit to aggregated or binned data; that's not what glm does. I took a quick look for canned ways of fitting distributions to canned data but couldn't find one; it looks like earlier versions of the bda package might have offered this, but not now.

At root, what you need to do is set up a negative log-likelihood function that computes (# counts)*prob(count|lambda) and minimize it using optim(); the solution given below using the bbmle package is a little more complex up-front but gives you added benefits like easily computing confidence intervals etc..

Set up data:

set.seed(101)
random_pois <- rpois(10000,3)
tt <- table(random_pois)
dd <- data.frame(counts=unname(c(tt)),
                 val=as.numeric(names(tt)))

Here I'm using table rather than hist because histograms on discrete data are fussy (having integer cutpoints often makes things confusing because you have to be careful about right- vs left-closure)

Set up density function for binned Poisson data (to work with bbmle's formula interface, the first argument must be called x, and it must have a log argument).

 dpoisbin <- function(x,val,lambda,log=FALSE) {
     probs <- dpois(val,lambda,log=TRUE)
     r <- sum(x*probs)
    if (log) r else exp(r)
 }

Fit lambda (log link helps prevent numerical problems/warnings from negative lambda values):

library(bbmle)
m1 <- mle2(counts~dpoisbin(val,exp(loglambda)),
        data=dd,
        start=list(loglambda=0))
all.equal(unname(exp(coef(m1))),mean(random_pois),tol=1e-6)  ## TRUE
exp(confint(m1))
##    2.5 %   97.5 % 
## 2.972047 3.040009 
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453