0

There are more than 50 exposure variables and a total of 16,000 data.
I need to analyze the association between all exposure variables and outcome.

I would like to repeat the following formula.

example.data <- data.frame(outcome = c(0,0,0,1,0,1,0,0,1,0),
                           exposure_1 = c(2.03, 2.13, 0.15, -0.14, 0.32,2.03, 2.13, 0.15, -0.14, 0.32),
                           exposure_2 = c(-0.11, 0.93, -1.26, -0.95, 0.24,-0.11, 0.93, -1.26, -0.95, 0.24),
                           age = c(20, 25, 30, 35, 40, 50, 55, 60, 65, 70),
                           bmi = c(20, 23, 21, 20, 25, 18, 20, 25, 26, 27))

logit_1 <- glm(outcome ~exposure_1, family = binomial, data = example.data)
logit_2 <- glm(outcome~ exposure_2 + age+ bmi, family = binomial, data = example.data)

I made a formula like this.

Model1 <- function(x) {
  temp <- glm(reformulate(x,response="outcome"), data=example.data, family=binomial)
  c(exp(summary(temp)$coefficients[2,1]), # OR
    exp(confint(temp)[2,1]), # CI 2.5 
    exp(confint(temp)[2,2]), # CI 97.5
    summary(temp)$coefficients[2,4], # P-value
    colAUC(predict(temp, type = "response"),example.data$outcome)) #AUC
Model1.data <- as.data.frame(t(sapply(setdiff(names(example.data),"outcome"), Model1)))
}

Model2 <- function(x) {
  temp <- glm(reformulate(x + age + bmi, response="outcome"), data=example.data, family=binomial)
  c(exp(summary(temp)$coefficients[2,1]), # OR
    exp(confint(temp)[2,1]), # CI 2.5 
    exp(confint(temp)[2,2]), # CI 97.5
    summary(temp)$coefficients[2,4], # P-value
    colAUC(predict(temp, type = "response"),example.data$outcome)) #AUC
}
Model2.data <- as.data.frame(t(sapply(setdiff(names(example.data),"outcome"), Model2)))

However, function "Model2" is not working.
The code that I made only operates single binary logistic regression, but can not be analyzed by adding multiple confounders.

2 Answers2

1

Use c() not + in reformulate. In both your functions, x takes the value of column names. I'll use mtcars column names to illustrate:

## Model1 works
reformulate("hp", response = "mpg")
# mpg ~ hp

## Model2 doesn't work
reformulate("hp" + wt + cyl, response = "mpg")
# Error in reformulate("hp" + wt + cyl, response = "mpg") : 
#   object 'wt' not found

## Fix it with `c()` and quoted column names
reformulate(c("hp", "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

## Showing it works with a mix of variables and quoted column names
x = "hp"
reformulate(c(x, "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

So in your Model2, change to reformulate(c(x, "age", "bmi"), response="outcome")

You have additional problems - you are running Model2 on all columns except for outcome (setdiff(names(example.data),"outcome")), but you should also exclude bmi and age since they are included inside the function.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you for explaining not only the solution but also the principle that the formula does not work. As you told me, I changed the formula and the calculation was successfully working. For the additional problem, I would like to ask you if it's okay if I don't use/interpret the OR values of ```bmi``` and ```age```. – Junghun Yoo Nov 09 '20 at 01:27
  • The question of whether or not to use/interpret the `bmi` and `age` predictors is not one I can answer. It should mostly be answered by subject matter expertise, and I don't have a clue what your `outcome` is, or even what subject this model deals with. – Gregor Thomas Nov 09 '20 at 03:03
0

So, if I understand correctly, you want one model outcome ~ exposure and one model outcome ~ exposure + age + bmi for each of the 50 exposure variables...

Here's one way to do that using functional programming, assuming all the exposure variables are named exposure_1, exposure_2 etc. :

library(tidyverse)

# formulas for exposure only
lexpo1 <-
  str_subset(string = names(example.data), pattern = "^exposure") %>%
  as.list() %>%
  modify(., ~ reformulate(termlabels = ., response = "outcome"))

# formulas for exposure + age + bmi
lexpo2 <-
  lexpo1 %>%
  modify(~ modelr::add_predictors(.x, ~age, ~bmi))

# put together
list_mods <-
  c(lexpo1, lexpo2)

# fit the models
list_res <-
  list_mods %>%
  map(., ~ glm(., family = binomial, data = example.data))
meriops
  • 997
  • 7
  • 6