1

I am using the R programming language. I am interested in knowing if there is a way to estimate the actual run time of a procedure (relative to the "strength" of your computer) without actually running that procedure.

For example, suppose I want to determine how long the below procedure takes to run on my computer :

 library(caret)
    library(rpart)

#generate data

a = rnorm(80000, 10, 10)
b = rnorm(80000, 10, 5)
c = rnorm(80000, 5, 10)
group <- sample( LETTERS[1:2], 80000, replace=TRUE, prob=c(0.5,0.5))
group_1 <- 1:80000

#put data into a frame
d = data.frame(a,b,c, group, group_1)
d$group = as.factor(d$group)

e <- d
vec1 <- sample(200:300, 5)
vec2 <- sample(400:500,5)
vec3 <- sample(700:800,5)
z <- 0
df <- expand.grid(vec1, vec2, vec3)
df$Accuracy <- NA

for (i in seq_along(vec1)) { 
    for (j in seq_along(vec2)) {
        for (k in seq_along(vec3)) {
            # d <- e
            d$group_2 = as.integer(ifelse(d$group_1 < vec1[i] , 0, ifelse(d$group_1 >vec1[i]  & d$group_1 < vec2[j] , 1, ifelse(d$group_1 >vec2[j]  & d$group_1 < vec3[k] , 2,3))))
            
            d$group_2 = as.factor(d$group_2)
            
            
            
            TreeFit <- rpart(group_2 ~ ., data = d[,-5])
            
            pred <- predict(
                TreeFit,
                d[,-5], type = "class")
            
            con <- confusionMatrix(
                d$group_2,
                pred) 
            
            #update results into table
            #final_table[i,j] = con$overall[1]
            z <- z + 1
            df$Accuracy[z] <- con$overall[1]
        }
    }
}

head(df)

I could just "sandwich" that procedure between the following lines of code and determine how long it took

start_time <- proc.time()

#copy and paste the entire block of code here

 proc.time() - start_time

#results

 user  system elapsed 
  51.86    0.36   52.22 

But suppose it is a really lengthy procedure and I want to roughly estimate how long it will take for my computer to run before actually running it - is this possible?

Thanks

stevec
  • 41,291
  • 27
  • 223
  • 311
stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

1

Since you're using nested loops, instead of timing the whole thing, try timing the first of, or small number of, iterations of the loop..

E.g. instead of

for (i in seq_along(vec1)) { 
    for (j in seq_along(vec2)) {
        for (k in seq_along(vec3)) {

try iterating along only the first few elements of each

for (i in seq_along(vec1[1:3])) { 
    for (j in seq_along(vec2[1:3])) {
        for (k in seq_along(vec3[1:3])) {

or whatever makes sense for your use case.

Once you know the timing for a small subset of the data, you could make an educated guess as to how long it may take for larger datasets.

stevec
  • 41,291
  • 27
  • 223
  • 311
  • 1
    Thank you! This is what i was thinking of... i wasn't sure if i could approach the problem this way. Thank you! – stats_noob Feb 03 '21 at 01:02