1
df <- data.frame(
disease = c(0,1,0,1),
var1 = c(0,1,2,0),
var2 =c(0,1,2,0),
var3 = c(0,1,2,0),
var40 = c(0,1,2,0),
Bi = c(0,1,0,1),
gender = c(1,0,1,0),
P1 = c(-0.040304832,0.006868288,0.002663759,0.020251087),
P2 = c(0.010566526,0.002663759,0.017480721,-0.008685749),
P3 = c(-0.008685749,0.020251087,-0.040304832,0.002663759),
P4 = c(0.017480721,0.024306667,0.002663759,0.010566526),
stringsAsFactors = FALSE)

The above data frame (df) consists of categorical and numerical variables namely; Disease, Bi and gender with labels 0,1, while var1 to var40 consists of a labels of 0,1,2, whereas PC1,PC2,PC3,PC4 consists of continuous numerical variables. The code for glm model for one variable will be:

glm(disease ~ var1*Bi+ gender+P1+P2+P3+P4, family = binomial(link
= 'logit'), data = df)

I need some help to write a loop that automatically performs the multivariate regression analysis for Disease versus variant1(var1) to Variant40(var) with same covariates namely; Bi, gender, P1, P2,P3,P4. I was doing something like below mentioned loop for all 40 variants but it's not working :

for (i in df$var1:df$var40) {glm(DepVar1 ~ i*Bi+gender+P1+P2+P3+P4, data=df, 
family=binomial("logit")) }

1 Answers1

1

Buyilding formulas dynamically can be a bit trickly, but there are functions like update() and reformulate() that can help. For example

results <- Map(function(i) {
  newform <- update(disease ~ Bi+gender+P1+P2+P3+P4, reformulate(c(".", i)))
  glm(newform, data=df, family=binomial("logit")) 
}, names(subset(df, select=var1:var40)))

Here we use Map rather than a for loop so it's easier to save the results (they will be put into a list in with this method). But we use update() to add in the new variables of interest to the base formula. So for example

update(disease ~ Bi+gender+P1+P2+P3+P4, ~ . + var1)
# disease ~ Bi + gender + P1 + P2 + P3 + P4 + var1

this adds a variable to the right hand side. We use reformulate() to turn the name of the column as a string into a formula.

you can get all the models out from the list with

results$var1
results$var40
# etc
MrFlick
  • 195,160
  • 17
  • 277
  • 295