Iterate models without hardcoding it with R

Question

I have this code

#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"  "X19" "X9"  "X10" "X16"    
formula = result ~ X3+X4+X5+X6+X7+X8+X9+X10+X16+X19
full_magnetude_model = glm.fit <- glm(formula, data = train)
full_magnetude_predict = predict(full_magnetude_model, newdata=test)

# Comparing results
full_magnetude_results <- ifelse(full_magnetude_predict > 0.5, 1, 0)
true_results = test$result

# results
table(full_magnetude_results,true_results)

It is working properly but the results are functuating for different formulas and I need do the same for:

#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"  "X19" "X9"  "X10" "X16"    
#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"  "X19" "X9"  "X10"    
#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"  "X19" "X9"      
#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"  "X19"     
#  [1] "X6"  "X5"  "X7"  "X4"  "X3"  "X8"

and so on, I can manualy do this but is there are smart way to do it?

Update

full code: https://github.com/martin-varbanov96/fmi_summer_2018/blob/master/fmi_6ti_sem/Pril_stat/project/main.R

the idea is to make a list of formulas and apply my code for each element of the list

I assume fitting formulas into a list and using `lapply` with your code on each of them should end up a with a list of table results looking like what you're after (but the overall question is above my head, so I may overlook something) — Tensibai, Jul 02 '18 at 12:42
I am not sure what you mean but I think I can achieve the same by making a list of formulas and wrapping my code over a for loop for each formula — Hartun, Jul 02 '18 at 12:44
Let's see if I get it right: the first code block in your question is the exemple for 1 formula, and you'd want 5 tables beings the results for the 5 formulas in your second code block, right ? — Tensibai, Jul 02 '18 at 12:47
yes that is correct, I've been thinking of doing something like formulas = [formula1, formula2.....] and looping for the array, using the formulas to do the same code — Hartun, Jul 02 '18 at 12:51
Check `list` and `lapply`, this should do the trick. I can't really write an answer without dummy data to fill in — Tensibai, Jul 02 '18 at 12:52
BTW your `full_magnetude_results <- ifelse(full_magnetude_predict > 0.5, 1, 0)` would be better without the ifelse, ust use `full_magnetude_results <- full_magnetude_predict > 0.5`to get a TRUE/FALSE vector. (using integers in place of booleans whithout a reason is usually a bad practice IMHO) — Tensibai, Jul 02 '18 at 12:56
I've made an update, that last recommendation is not very good, because I'd also have to change my test set which is in {0,1} — Hartun, Jul 02 '18 at 13:02
you can get all the formulas:`Reduce(function(x,y)reformulate(attr(drop.terms(terms(x),y),"term.labels"),"results"),init = formula,10:7,accumulate = T)` But still you will have to loop over them. Instead of doing this, you can use the `update` function, which will just drop one variable and give you the results when that variable is dropped. That will be the better option — Onyambu, Jul 02 '18 at 13:05
Then at worst: `as.integer(full_magnetude_predict > 0.5)` sounds less complex than ifelse — Tensibai, Jul 02 '18 at 13:06
`lapply(1:4,function(x)reformulate(head(a,-x),"results"))` where `a` is a vector of all the variables, ie `a=c("X3","X4",....)` — Onyambu, Jul 02 '18 at 13:11

score 2 · Accepted Answer · answered Jul 02 '18 at 13:18

Probably not a full answer but this should give an idea:

terms=c("X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X16", 
        "X19")

lapply(10:6,function(x) {
  formula <- as.formula(paste("result ~ ", paste0(terms[1:x],collapse="+")))
  formula
})

Gives:

[[1]]
result ~ X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X16 + X19
<environment: 0x0000000016fbec50>

[[2]]
result ~ X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X16
<environment: 0x0000000016fc29a0>

[[3]]
result ~ X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10
<environment: 0x00000000170f44b0>

[[4]]
result ~ X3 + X4 + X5 + X6 + X7 + X8 + X9
<environment: 0x00000000170f8a98>

[[5]]
result ~ X3 + X4 + X5 + X6 + X7 + X8
<environment: 0x00000000170fd1d0>

Lapply will iterate as much as the range (10 to 6) passing that to the anonymous function, this x will be used to select the needed terms.

The idea is to build your formulas from the needed terms, here removing one each time, pasting them as show in ?as.formula documentation and getting the formula, the rest of your code can be used as is, the resulting list will contain tables instead of the formulas in this example.

Iterate models without hardcoding it with R

1 Answers1