0

I want to test the output of a simple linear regression model against data altered via a 'grid search' method of combinations to find the optimal data preparation.

Lets say I have x test variables each containing n rows of data. x and n may vary with different data sets. I also have a scaling vector, v, which again will be of changing length.

For example:

tbl <-  read.table(text = 
    "Field1 Field2
    100 200
    150 180
    200 160
    280 250
    300 300
    300 250",
header = TRUE) #length(x) is 2 here

v <- c(0, 0.1, 0.2) # length(v) is 3

What I want to do is loop through (or 'apply' ?) each subset of combinations of the scaling vector v and in each iteration, test my model.

In other words, effectively loop through possible values of v for each x:

Field 1   Field 2
0.0       0.0
0.1       0.0
0.2       0.0
0.0       0.1
0.1       0.1
0.2       0.1
0.0       0.2
0.1       0.2
0.2       0.2

and in each iteration, scale Field 1 by the value in its column, and scale Field 2 by its value. My scaling function is actually filter(tbl, v, method="recursive") (thanks!) - so in the first iteration my dataframe will be unchanged (both Field1 and Field2 have v=0), in the second iteration Field 1 will have the filter function applied with parameter v = 0.1 but Field 2 will be unchanged (v=0) ... and combination 4 will have Field 1 unchanged (v = 0) but Field 2 will have filter applied with parameter v = 0.1.

I can brute force this by nesting a loop, however I will have different numbers of x variables (likely somewhere between 1 and 10) and the length of v may vary too. Writing out 10 nested loops feels the wrong way to go.

Can I create a matrix of these possible combinations, then somehow apply them against my data frame? I'm unclear how to do this if so and any help would be appreciated!

Many thanks.

Jon
  • 445
  • 3
  • 15
  • I think you want `expand.grid` to create your possibilities – Dason Mar 05 '18 at 16:26
  • I think I can generate the combination matrix from the plyr package: `expand.grid(rlply(length(x), v)`. This gives me the various combinations; but how to apply these combinations? Guess I could loop at this stage? – Jon Mar 05 '18 at 16:46
  • You could just use apply directly. – Dason Mar 05 '18 at 17:07

1 Answers1

0

I went with a semi-brute force answer in the end.

First, I created a grid of all combinations

combo.matrix <- expand.grid(replicate(length(x), list(v), simplify = T))

Then I looped through each row of this grid, nesting inside a loop over each v.

 for (r in 1:nrow(combo.matrix)) {

   new.df <- source.data  # Reset

   for (x in 1:length(v)) {

        new.df[,colnames(new.df) == v[x]] <- base::filter(
                new.df[,colnames(new.df) == v[x]], 
                combo.matrix[r, colnames(combo.matrix) == v[x]], 
                method="recursive")

    }

    # Run regression
    regression <- lm(lm.formula, data=new.df)
    reg.results$Adjusted.r2[r] <- summary(regression)$adj.r.squared
 }

An apply function may well have been better, but I wasn't confident in how this would work. Correct answer awarded to anyone who can :) - but otherwise, I'm good to go. Thanks.

Jon
  • 445
  • 3
  • 15