0

So I want to find the estimate parameter using GLM and compare it with mle2 package. Here's my code for GLM

d <- read.delim("http://dnett.github.io/S510/Disease.txt")

d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
glm2 <- glm(disease~ses+sector, family=binomial(link=logit), data=d)
summary(glm2)

And my code for mle2()

y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
x3<-as.numeric(as.character(d$ses))

library(bbmle)
nlldbin=function(A,B,C,D){
  eta<-A+B*(x3==2)+C*(x3==3)+D*(x2==2)
  p<-1/(1+exp(-eta))
  joint.pdf= (p^y)*((1-p)^(1-y))
  -sum(joint.pdf, log=TRUE ,na.rm=TRUE)
}
st <- list(A=0.0001,B=0.0001,C=0.0001,D=0.0001)
est_mle2<-mle2(start=st,nlldbin,hessian=TRUE)
summary(est_mle2)

But the result is quiet different. Please help me to fix this, thank you!

> summary(est_mle2)
Maximum likelihood estimation

Call:
mle2(minuslogl = nlldbin, start = st, hessian.opts = TRUE)

Coefficients:
     Estimate  Std. Error z value  Pr(z)
A    -20.4999   5775.1484 -0.0035 0.9972
B     -5.2499 120578.9515  0.0000 1.0000
C     -7.9999 722637.2670  0.0000 1.0000
D     -2.2499  39746.6639 -0.0001 1.0000

> summary(glm2)

Call:
glm(formula = disease ~ ses + sector, family = binomial(link = logit), 
    data = d) 

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.52001    0.33514  -4.535 5.75e-06 ***
ses2        -0.08525    0.41744  -0.204 0.838177    
ses3         0.16086    0.39261   0.410 0.682019    
sector2      1.28098    0.34140   3.752 0.000175 ***
Waldi
  • 39,242
  • 6
  • 30
  • 78

2 Answers2

0

This line

-sum(joint.pdf, log=TRUE ,na.rm=TRUE)

is wrong. sum doesn't have a special log argument; what you're doing is adding the value TRUE (which gets converted to 1) to the pdf.

What you want is

-sum(log(joint.pdf), na.rm=TRUE)

but this is also not very good for numerical reasons, as the pdf is likely to underflow. A better way of writing it would be

logpdf <- y*log(p) + (1-y)*log(1-p)
-sum(logpdf, na.rm=TRUE)
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • bu the results is different, estimate from GLM: (Intercept) -1.52001, ses2 -0.08525, ses3 0.16086 , sector2 1.28098. While mle2, A -0.9515150, B -0.0259063, C -0.0210864, D -0.0053744. – Jasmine Helen Apr 26 '21 at 07:55
0

I'm not sure your definition of eta is correct. I would use the model matrix.

X <- model.matrix(~ ses + sector, data = d)

nlldbin <- function(A,B,C,D){
  eta <- X %*% c(A, B, C, D)
  p <- 1/(1+exp(-eta))
  logpdf <- y*log(p) + (1-y)*log(1-p)
  -sum(logpdf)
}
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225