How to make loop to contain confounders(covariates) in a binary logistic regression using glm in R?

Question

There are more than 50 exposure variables and a total of 16,000 data.
I need to analyze the association between all exposure variables and outcome.

I would like to repeat the following formula.

example.data <- data.frame(outcome = c(0,0,0,1,0,1,0,0,1,0),
                           exposure_1 = c(2.03, 2.13, 0.15, -0.14, 0.32,2.03, 2.13, 0.15, -0.14, 0.32),
                           exposure_2 = c(-0.11, 0.93, -1.26, -0.95, 0.24,-0.11, 0.93, -1.26, -0.95, 0.24),
                           age = c(20, 25, 30, 35, 40, 50, 55, 60, 65, 70),
                           bmi = c(20, 23, 21, 20, 25, 18, 20, 25, 26, 27))

logit_1 <- glm(outcome ~exposure_1, family = binomial, data = example.data)
logit_2 <- glm(outcome~ exposure_2 + age+ bmi, family = binomial, data = example.data)

I made a formula like this.

Model1 <- function(x) {
  temp <- glm(reformulate(x,response="outcome"), data=example.data, family=binomial)
  c(exp(summary(temp)$coefficients[2,1]), # OR
    exp(confint(temp)[2,1]), # CI 2.5 
    exp(confint(temp)[2,2]), # CI 97.5
    summary(temp)$coefficients[2,4], # P-value
    colAUC(predict(temp, type = "response"),example.data$outcome)) #AUC
Model1.data <- as.data.frame(t(sapply(setdiff(names(example.data),"outcome"), Model1)))
}

Model2 <- function(x) {
  temp <- glm(reformulate(x + age + bmi, response="outcome"), data=example.data, family=binomial)
  c(exp(summary(temp)$coefficients[2,1]), # OR
    exp(confint(temp)[2,1]), # CI 2.5 
    exp(confint(temp)[2,2]), # CI 97.5
    summary(temp)$coefficients[2,4], # P-value
    colAUC(predict(temp, type = "response"),example.data$outcome)) #AUC
}
Model2.data <- as.data.frame(t(sapply(setdiff(names(example.data),"outcome"), Model2)))

However, function "Model2" is not working.
The code that I made only operates single binary logistic regression, but can not be analyzed by adding multiple confounders.

Gregor Thomas · Accepted Answer · 2020-11-06T05:13:48.127

1

Use c() not + in reformulate. In both your functions, x takes the value of column names. I'll use mtcars column names to illustrate:

## Model1 works
reformulate("hp", response = "mpg")
# mpg ~ hp

## Model2 doesn't work
reformulate("hp" + wt + cyl, response = "mpg")
# Error in reformulate("hp" + wt + cyl, response = "mpg") : 
#   object 'wt' not found

## Fix it with `c()` and quoted column names
reformulate(c("hp", "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

## Showing it works with a mix of variables and quoted column names
x = "hp"
reformulate(c(x, "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

So in your Model2, change to reformulate(c(x, "age", "bmi"), response="outcome")

You have additional problems - you are running Model2 on all columns except for outcome (setdiff(names(example.data),"outcome")), but you should also exclude bmi and age since they are included inside the function.

edited Nov 06 '20 at 05:13

answered Nov 06 '20 at 05:00

Gregor Thomas

136,190
20
167
294

Thank you for explaining not only the solution but also the principle that the formula does not work. As you told me, I changed the formula and the calculation was successfully working. For the additional problem, I would like to ask you if it's okay if I don't use/interpret the OR values of ```bmi``` and ```age```. – Junghun Yoo Nov 09 '20 at 01:27
The question of whether or not to use/interpret the `bmi` and `age` predictors is not one I can answer. It should mostly be answered by subject matter expertise, and I don't have a clue what your `outcome` is, or even what subject this model deals with. – Gregor Thomas Nov 09 '20 at 03:03

score 0 · Answer 2 · answered Nov 06 '20 at 10:05

So, if I understand correctly, you want one model outcome ~ exposure and one model outcome ~ exposure + age + bmi for each of the 50 exposure variables...

Here's one way to do that using functional programming, assuming all the exposure variables are named exposure_1, exposure_2 etc. :

library(tidyverse)

# formulas for exposure only
lexpo1 <-
  str_subset(string = names(example.data), pattern = "^exposure") %>%
  as.list() %>%
  modify(., ~ reformulate(termlabels = ., response = "outcome"))

# formulas for exposure + age + bmi
lexpo2 <-
  lexpo1 %>%
  modify(~ modelr::add_predictors(.x, ~age, ~bmi))

# put together
list_mods <-
  c(lexpo1, lexpo2)

# fit the models
list_res <-
  list_mods %>%
  map(., ~ glm(., family = binomial, data = example.data))

How to make loop to contain confounders(covariates) in a binary logistic regression using glm in R?

2 Answers2