2

I have a nested loop with 60 dimensions, i.e. I nest 60 loops into each other. In Stata the MWE would look like the following:

forvalues i = 1/60 {
    forvalues j = 1/60 {
        forvalues k = 1/60 {
            forvalues l = 1/60 {
                ... imagine the 56 remaining loops here
            }
        }
    }
}

The equivalent in R is:

for(i in 1:60) {
    for(j in 1:60) {
        for(k in 1:60) {
            for(l in 1:60) {
                ... imagine the 56 remaining loops here
            }
        }
    }
}

The objective here is to avoid typing all 60 levels into my code and create a loop for the loop structure itself instead. This question appears so trivial. But for some reason I am struggling to come up with a solution.

Thank you for any suggestions.

Additional Information:

I have a dataset with 60 explanatory variables in it and would like to run regressions with every possible combination of these variables. More specifically I run univariate regressions of the dependent variable on all 60 explanatory variables separately and calculate certain criteria. Then I add a second regressor to the estimation equation and calculate criteria again. I.e. reg DependentVar ExplVar1 ExplVar2, reg DependentVar ExplVar1 ExplVar3, ..., reg DependentVar ExplVar60 ExplVar59. Dependent on the calculated criteria a branch of that regression tree is either advanced or terminated. E.g. the first branch reg DependentVar ExplVar1 ExplVar2 either continues to grow as reg DependentVar ExplVar1 ExplVar2 ExplVar3, reg DependentVar ExplVar1 ExplVar2 ExplVar4 etc. or terminated as reg DependentVar ExplVar1 ExplVar2. Branches that contain an explanatory factor more than once are also cut - such as reg DependentVar ExplVar1 ExplVar1 or reg DependentVar ExplVar1 ExplVar2 ExplVar1. Overall, I hence design a model selection approach. I am aware of already existant model selection commands, but need one that is customized to specific properties of the given dataset.

Chr
  • 1,017
  • 1
  • 8
  • 29
  • 2
    Maybe zoom out and tell us the end goal you're trying to achieve. It's rare that this kind of approach can't be solved in a better way. – Thomas Aug 22 '18 at 15:11
  • 4
    I find it hard to imagine this is going to be your most efficient method. You will end up with 60^60 (approx 5 *10^106) iterations. If you give the end goal people may be able to suggest a better method – Chris Littler Aug 22 '18 at 15:14
  • *I nest 60 loops into each other.*... possibly not. Your question reads as the [XY Problem](https://meta.stackexchange.com/a/66378) where you ask help for your attempted *Y* solution to an actual *X* problem you do not share. Give us more background and problem and you may not need any loops! – Parfait Aug 22 '18 at 15:48
  • I edited the question. It now provides additional information describing the issue. – Chr Aug 22 '18 at 16:03
  • The answer by [@Onyambu](https://stackoverflow.com/questions/51357270/r-reshape-data-by-combining-common-value-of-two-variables/51357912?noredirect=1#comment90532893_51357912), is probably can extend to your solutions? – user5249203 Aug 22 '18 at 16:05
  • Have a look here https://stackoverflow.com/q/40765869/5784831 – Christoph Aug 22 '18 at 16:05
  • 2
    It sounds like you want fairly standard forward stepwise regression, which is provided by the **olsrr** package, see: https://cran.r-project.org/web/packages/olsrr/vignettes/variable_selection.html. If - for some reason - you want to write this from scratch, you almost certainly want to use recursion rather than a bunch of nested for-loops. – Thomas Aug 22 '18 at 16:33
  • Thank you for your suggestion, @Thomas . The _Stepwise AIC Forward Regression_ of the **olsrr** package would suit my problem quite well if I had not that many missing values in my dataframe. In fact there is not a single observation for which all 60 potential explantory variables are given. The first line `model <- lm(y ~ ., data = mydata)` hence fails already. However, I do not want to estimate a model that contains all 60 regressors but test all possible variable subsets for which common observations are available. Does anyone know how to solve this? – Chr Aug 23 '18 at 12:07
  • What you are asking makes this analysis difficult. If you are comparing regression models to each other using different data sets (by removing rows that are missing values) then you can end up with strange effects indeed. And you need to make a conscious decision before you remove rows to make sure you aren't causing problems with your analysis. You can consider whether it is possible to impute variables. For example if I know the temperature at 10am and the temperature at 12pm, then I can probably impute the temperature at 11am with reasonable accuracy. – Adam Sampson Aug 23 '18 at 20:47
  • You also need to consider that there may be multicollinearity in your model. Have you checked to see how much each variable correlates to every other variable? `corrplot::corrplot(M, method="number")` – Adam Sampson Aug 23 '18 at 20:48

1 Answers1

5

Consider rapply with combn. Below demonstrates for 5 explanatory variables. For actual use case:

  • replace paste0("ExplVar", 1:5) with names of your 60 variables (possibly using names(df))
  • replace sequence 1:5 to 1:60 which includes simple one variable regression
  • replace DepVar with actual dependent variable

Being the the recursive member of the apply family, rapply (which I never dreamed would be dusted off the shelf for an SO answer!) will build a character vector of linear formulas from nested list which can then be iterated with lm:

expvar_list <- lapply(1:5, function(x) combn(paste0("ExplVar", 1:5), x, simplify=FALSE))

formulas_list <- rapply(expvar_list, function(x) paste("DepVar ~", paste(x, collapse="+")))
formulas_list
#  [1] "DepVar ~ ExplVar1"                                    
#  [2] "DepVar ~ ExplVar2"                                    
#  [3] "DepVar ~ ExplVar3"                                    
#  [4] "DepVar ~ ExplVar4"                                    
#  [5] "DepVar ~ ExplVar5"                                    
#  [6] "DepVar ~ ExplVar1+ExplVar2"                           
#  [7] "DepVar ~ ExplVar1+ExplVar3"                           
#  [8] "DepVar ~ ExplVar1+ExplVar4"                           
#  [9] "DepVar ~ ExplVar1+ExplVar5"                           
# [10] "DepVar ~ ExplVar2+ExplVar3"                           
# [11] "DepVar ~ ExplVar2+ExplVar4"                           
# [12] "DepVar ~ ExplVar2+ExplVar5"                           
# [13] "DepVar ~ ExplVar3+ExplVar4"                           
# [14] "DepVar ~ ExplVar3+ExplVar5"                           
# [15] "DepVar ~ ExplVar4+ExplVar5"                           
# [16] "DepVar ~ ExplVar1+ExplVar2+ExplVar3"                  
# [17] "DepVar ~ ExplVar1+ExplVar2+ExplVar4"                  
# [18] "DepVar ~ ExplVar1+ExplVar2+ExplVar5"                  
# [19] "DepVar ~ ExplVar1+ExplVar3+ExplVar4"                  
# [20] "DepVar ~ ExplVar1+ExplVar3+ExplVar5"                  
# [21] "DepVar ~ ExplVar1+ExplVar4+ExplVar5"                  
# [22] "DepVar ~ ExplVar2+ExplVar3+ExplVar4"                  
# [23] "DepVar ~ ExplVar2+ExplVar3+ExplVar5"                  
# [24] "DepVar ~ ExplVar2+ExplVar4+ExplVar5"                  
# [25] "DepVar ~ ExplVar3+ExplVar4+ExplVar5"                  
# [26] "DepVar ~ ExplVar1+ExplVar2+ExplVar3+ExplVar4"         
# [27] "DepVar ~ ExplVar1+ExplVar2+ExplVar3+ExplVar5"         
# [28] "DepVar ~ ExplVar1+ExplVar2+ExplVar4+ExplVar5"         
# [29] "DepVar ~ ExplVar1+ExplVar3+ExplVar4+ExplVar5"         
# [30] "DepVar ~ ExplVar2+ExplVar3+ExplVar4+ExplVar5"         
# [31] "DepVar ~ ExplVar1+ExplVar2+ExplVar3+ExplVar4+ExplVar5"

models_list <- lapply(formulas_list, function(x) summary(lm(as.formula(x), mydata)))

NOTE: Be careful as the number of combinations for 60 variables across varying lengths is very high!

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks a lot. It works perfectly. Using your code I can easily parallelize the process to cut down on computational time. – Chr Aug 24 '18 at 14:33
  • 1
    Wow! Great! Once again, never thought `rapply` would ever be used. I am a believer. – Parfait Aug 24 '18 at 14:34