Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

Question

As a follow up to this question, I fitted the Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables. MWE is given below:

Type  <- rep(x=LETTERS[1:3], each=5)
Conc  <- rep(x=seq(from=0, to=40, by=10), times=3)
Total <- 50
Kill  <- c(10, 30, 40, 45, 38, 5, 25, 35, 40, 32, 0, 32, 38, 47, 40)

df <- data.frame(Type, Conc, Total, Kill)

fm1 <- 
  glm(
      formula = Kill/Total~Type*Conc
    , family  = binomial(link="logit")
    , data    = df
    , weights = Total
    )

summary(fm1)

Call:
glm(formula = Kill/Total ~ Type * Conc, family = binomial(link = "logit"), 
    data = df, weights = Total)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-4.871  -2.864   1.204   1.706   2.934  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.65518    0.23557  -2.781  0.00541 ** 
TypeB       -0.34686    0.33677  -1.030  0.30302    
TypeC       -0.66230    0.35419  -1.870  0.06149 .  
Conc         0.07163    0.01152   6.218 5.04e-10 ***
TypeB:Conc  -0.01013    0.01554  -0.652  0.51457    
TypeC:Conc   0.03337    0.01788   1.866  0.06201 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 277.092  on 14  degrees of freedom
Residual deviance:  96.201  on  9  degrees of freedom
AIC: 163.24

Number of Fisher Scoring iterations: 5

anova(object=fm1, test="LRT")

Analysis of Deviance Table

Model: binomial, link: logit

Response: Kill/Total

Terms added sequentially (first to last)


          Df Deviance Resid. Df Resid. Dev Pr(>Chi)    
NULL                         14    277.092             
Type       2    6.196        12    270.895  0.04513 *  
Conc       1  167.684        11    103.211  < 2e-16 ***
Type:Conc  2    7.010         9     96.201  0.03005 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


df$Pred <- predict(object=fm1, data=df, type="response")

df1 <- with(data=df,
               expand.grid(Type=levels(Type)
                           , Conc=seq(from=min(Conc), to=max(Conc), length=51)
                           )
      )
df1$Pred <- predict(object=fm1, newdata=df1, type="response")

library(ggplot2)
ggplot(data=df, mapping=aes(x=Conc, y=Kill/Total, color=Type)) + geom_point() +
  geom_line(data=df1, mapping=aes(x=Conc, y=Pred), linetype=2) +
  geom_hline(yintercept=0.5,col="gray")

I want to calculate LD50, LD90 and LD95 with their confidence intervals. As the interaction is significant so I want to calculate LD50, LD90 and LD95 with their confidence intervals for each Type (A, B, and C) separately.

LD stands for lethal dose. It is the amount of substance required to kill X% (LD50 = 50%) of the test population.

Edited Type is a qualitative variable representing different types of drugs and Conc is a quantitative variable representing different Concentrations of drugs.

What is this `LD50`, `LD90`and `LD95`, there is nothing like that in your MWE. — TheRimalaya, Mar 17 '16 at 17:20
as you pointed out in a message to me, [this question](http://stackoverflow.com/questions/35462144/lc50-ld50-confidence-intervals-from-multiple-regression-glm-with-interaction) is highly relevant. Have you taken a stab at adapting it to your question yet ? — Ben Bolker, Mar 19 '16 at 18:01
@TheRimalaya, he wants to calculate the concentration at which 50, 90 and 95% of animals are dead. — Axeman, Mar 19 '16 at 22:09
@BenBolker: Thanks for your comment. In the linked question both explanatory variables were treated quantitatively. However, in this problem one variable is quantitative and other is qualitative which make it different from the linked question. Thanks — MYaseen208, Mar 20 '16 at 09:29
Your variables are unclear to me. Can you please clarify **in your question, not the comments**? It appears `Type` represents a drug category, and `conc` represents the dosage, but I'm not sure — alexwhitworth, Mar 23 '16 at 16:28

score 4 · Accepted Answer · answered Mar 23 '16 at 20:21

You use the drc package to fit logistic dose-response models.

First fit the model

require(drc)
mod <- drm(Kill/Total ~ Conc, 
           curveid = Type, 
           weights = Total, 
           data = df, 
           fct =  L.4(fixed = c(NA, 0, 1, NA)), 
           type = 'binomial')

Here curveid=specifies the grouping of the data and fct= specifies a 4 parameter logistic function, with parameters for lower and upper bond fixed at 0 and 1.

Note the differences to glm are negligible:

df2 <- with(data=df,
            expand.grid(Conc=seq(from=min(Conc), to=max(Conc), length=51),
                        Type=levels(Type)))
df2$Pred <- predict(object=mod, newdata = df2)

Here's a histgramm of the differences to the glm prediction

hist(df2$Pred - df1$Pred)

Estimate Effective Doses (and CI) from the model

This is easy with the ED() function:

ED(mod, c(50, 90, 95), interval = 'delta')

Estimated effective doses
(Delta method-based confidence interval(s))

     Estimate Std. Error   Lower  Upper
A:50   9.1468     2.3257  4.5885 13.705
A:90  39.8216     4.3444 31.3068 48.336
A:95  50.2532     5.8773 38.7338 61.773
B:50  16.2936     2.2893 11.8067 20.780
B:90  52.0214     6.0556 40.1527 63.890
B:95  64.1714     8.0068 48.4784 79.864
C:50  12.5477     1.5568  9.4963 15.599
C:90  33.4740     2.7863 28.0129 38.935
C:95  40.5904     3.6006 33.5334 47.648

For each group we get ED50, ED90 & ED95 with CI.

Would appreciate if you guide me how to fit `glm(formula=Kill/Total~Rep+Type*Conc, family=binomial(link="logit"), data=df, weights=Total)` when `Rep` is a factor. Thanks — MYaseen208, Mar 27 '16 at 17:56
Thanks @EDi for your help. Would appreciate if you guide me how to fit `glm(formula=Kill/Total~Rep+Type*Conc, family=binomial(link="logit"), data=df, weights=Total)` when `Rep` is a factor. Thanks — MYaseen208, Mar 29 '16 at 01:41

score 3 · Answer 2 · answered Mar 23 '16 at 19:08

Your link function of choice (\eta= X\hat\beta) has variance for a new observation (x_0): V_{x_0}=x_0^T(X^TWX)^{-1}x_0

So, for a set of candidate doses, we can predict the expected percentage of deaths using the inverse function:

newdata= data.frame(Type=rep(x=LETTERS[1:3], each=5),
                    Conc=rep(x=seq(from=0, to=40, by=10), times=3))
mm <- model.matrix(fm1, newdata)

# get link on link terms (could also use predict)
eta0 <- apply(mm, 1, function(i) sum(i * coef(fm1)))

# inverse logit function
ilogit <- function(x) return(exp(x) / (1+ exp(x)))

# predicted probs
ilogit(eta0)


# for comfidence intervals we can use a normal approximation
lethal_dose <- function(mod, newdata, alpha) {
  qn <- qnorm(1 - alpha /2)
  mm <- model.matrix(mod, newdata)
  eta0 <- apply(mm, 1, function(i) sum(i * coef(fm1)))
  var_mod <- vcov(mod)

  se <- apply(mm, 1, function(x0, var_mod) {
    sqrt(t(x0) %*% var_mod %*% x0)}, var_mod= var_mod)

  out <- cbind(ilogit(eta0 - qn * se),
               ilogit(eta0),
               ilogit(eta0 + qn * se))
  colnames(out) <- c("LB_CI", "point_est", "UB_CI")

  return(list(newdata=newdata,
              eff_dosage= out))
}

lethal_dose(fm1, newdata, alpha= 0.05)$eff_dosage
$eff_dosage
       LB_CI point_est     UB_CI
1  0.2465905 0.3418240 0.4517820
2  0.4361703 0.5152749 0.5936215
3  0.6168088 0.6851225 0.7462674
4  0.7439073 0.8166343 0.8722545
5  0.8315325 0.9011443 0.9439316
6  0.1863738 0.2685402 0.3704385
7  0.3289003 0.4044270 0.4847691
8  0.4890420 0.5567386 0.6223914
9  0.6199426 0.6990808 0.7679095
10 0.7207340 0.8112133 0.8773662
11 0.1375402 0.2112382 0.3102215
12 0.3518053 0.4335213 0.5190198
13 0.6104540 0.6862145 0.7531978
14 0.7916268 0.8620545 0.9113443
15 0.8962097 0.9469715 0.9736370

Rather than doing this manually, you could also manipulate:

predict.glm(fm1, newdata, se=TRUE)$se.fit

Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

2 Answers2

Linked