1

I need to repeat the sampling procedure of the below loop 1000 times using a second loop.

This is the simplified code i produced for reproducability, the inner loop.

##Number of iterations
N = 8

##Store data from inner loop in vectors
PMSE <- rep(1 , N)
PolynomialDegree <- rep(1, N)

for (I in 1:N){
PolynomialDegree [I] <-  I
PMSE [I] <- I*rnorm(1)
}

Now, using a second , outer loop. I want repeat this "sampling procedure" 1000 times and store the data of all those vectors into a single dataframe. Im struggling to write the outer loop and was hoping for some assistance.

This is my attempt with non-reproducable code, I hope it is clear what i am attempting to do.

##Set number of iterations
N <- 8
M <- 1000

##Store data
OUTPUT <- rep(1,M)

##Outer loop starts
for (J in 1:M){
  PMSE <- rep(1 , N)
  PolynomialDegree <- rep(1, N)
  sample <- sample(nrow(tempraindata), floor(nrow(tempraindata)*0.7))
  training <- tempraindata[sample,]
  testing <- tempraindata[-sample,]

##Inner loop starts
for (I in 1:N){
  
  
  ##Set up linear model with x polynomial of degree I x = year, y = temp
  
        mymodel <- lm(tem ~ poly(Year, degree = I),  data = training)
  
  ##fit model on testing set and save predictions
        predictions <- predict(mymodel, newdata = testing, raw = FALSE)
  
  ##define and store PMSE
        PMSE[I] <- (1/(nrow(tempraindata)- nrow(training)))*(sum(testing$tem-predictions))^2
        PolynomialDegree [I] <-  I
} ## End of inner loop
                OUTPUT[J] <- ##THIS IS WHERE I WANT TO SAVE THE DATA 
                                        
} ##End outer loop

I want to store all the data inside OUTPUT and make it a dataframe, if done correctly it should contain 8000 values of PMSE and 8000 values of PolynomialDegree.

JoF
  • 49
  • 5

1 Answers1

0

Avoid the bookkeeping of initializing vectors and then assigning elements by index. Consider a single sapply (or vapply) passing both iterations to build a matrix of 8,000 elements of the PSME calculations within a 1000 X 8 structure. Every column would then be a model run (or PolynomialDegree) and every row the training/testing data pair.

## Set number of iterations
N <- 8
M <- 1000

## Defined method to generalize process
calc_PSME <- function(M, N) {
    ## Randomly build training/testing sets
    set.seed(M+N)     # TO REPRODUCE RANDOM SAMPLES
    sample <- sample(nrow(tempraindata), floor(nrow(tempraindata)*0.7))
    training <- tempraindata[sample,]
    testing <- tempraindata[-sample,]

    ## Set up linear model with x polynomial of degree I x = year, y = temp
    mymodel <- lm(tem ~ poly(Year, degree = N),  data = training)
  
    ## Fit model on testing set and save predictions
    predictions <- predict(mymodel, newdata = testing, raw = FALSE)
  
    ## Return single PSME value
    (
      (1/(nrow(tempraindata)- nrow(training))) * 
      (sum(testing$tem-predictions)) ^ 2
    )
}

# RETURN (1000 X 8) MATRIX WITH NAMED COLUMNS
PSME_matrix <- sapply(1:N, calc_PSME, 1:M)

PSME_matrix <- vapply(1:N, calc_PSME, numeric(M), 1:M)

Should you need a 8,000-row data frame of two columns, consider reshape to long format:

long_df <- reshape(
  data.frame(output_matrix),
  varying = 1:8,
  timevar = "PolynomialDegree",
  v.names = "PSME",
  ids = NULL,
  new.row.names = 1:1E4,
  direction = "long"
)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you so much for the quick reply and the help. That looks like a very elegant solution, however due to restrictions in the school assignment, the problem has to be solved using an internal and external for loop. Furthermore, running these lines code: PSME_matrix <- sapply(1:N, calc_PSME, 1:M) PSME_matrix <- vapply(1:N, calc_PSME, numeric(M), 1:M) yields the error message : Error in if (degree < 1) stop("'degree' must be at least 1") : the condition has length > 1 Which I am not familiar with. Best Regards @parfait – JoF Aug 19 '22 at 02:43
  • Interesting school assignment restriction especially for R language. Ask instructor about R's *apply* family! To diagnose issue, what does a single run of defined method return: `calc_PSME(1, 1)`. Do you get a single value or vector of more than one value? Your modeling/prediction results may return `NULL.`. – Parfait Aug 19 '22 at 16:53
  • I will, thank you. Doing single runs yields perfectly reasonable values of PMSE. @Parfait – JoF Aug 19 '22 at 20:05
  • Looks like you may be facing this [issue](https://stackoverflow.com/questions/39803144/high-or-very-high-order-polynomial-regression-in-r-or-alternatives) regarding your polynomial degree. To debug, add `print(M, N)` as first line inside function. Then when error raises, you can find at what iteration issue occurs. – Parfait Aug 19 '22 at 20:18