0

Especially with long script and many functions, it may be hard to identify which parts of the code takes the most time to process.

Identifying these are important because speeding up the parts of the code that are time-consuming, either when run once or because the code is repeated often, provides the greatest reductions in processing time.

How to track this efficiently?

Christian
  • 932
  • 1
  • 7
  • 22

1 Answers1

0

Consider the following three functions: Start(), End(), and Timestamps().

To start tracking the time of a process, you add the Start() function at the beginning of the code you are about to process. You must specify a label that describes what code is being time-tracked, and you must specify which global variable you want the time stamps to be saved to.

Equivalently, End() will stop tracking the time for the last process being tracked. Here, you only need to provide the global variable as an argument. It is not necessary to apply End() explicitly because starting a new time-tracking instance with Start() will do so automatically. However, End() is useful if you explicitly want to end the time-tracking.

And finally, Timestamps() will summarize all timestamps by their labels, and the function will provide the mean, coefficient of variation, number of timestamps with this label, and the total time spent for timestamps with this label. These results will be printed to the console.

library(dplyr)

The Start() function:

Start <- function(label, time_stamps){

  # ————————————————————————————————————————————————————————————————————————————
  # See if timestamp exists and create a new global env variable if not
  # ————————————————————————————————————————————————————————————————————————————

  time_stamps_exists <- exists(as.character(substitute(time_stamps)), envir=.GlobalEnv)

  if (!time_stamps_exists){
    assign(
      x = deparse(substitute(time_stamps)), 
      value = data.frame(
        label = character(),
        start = character(),
        end = character(),
        duration = numeric(),
        stringsAsFactors = F
      ), 
      env = .GlobalEnv)
  }


  # ————————————————————————————————————————————————————————————————————————————
  # Get the global env variable
  # ————————————————————————————————————————————————————————————————————————————

  df <- get(
    x = deparse(substitute(time_stamps)), 
    envir = .GlobalEnv)


  # ————————————————————————————————————————————————————————————————————————————
  # Add end stamp and duration on last instance if not done yet
  # ————————————————————————————————————————————————————————————————————————————

  if (nrow(df) > 0 && is.na(df$end[nrow(df)])){
    end_stamp <- 
      as.POSIXlt(Sys.time(), "%Y-%m-%d %H:%M:%OS3", tz = "CET") %>% 
      format(., "%Y-%m-%d %H:%M:%OS3")

    df$end[nrow(df)] <- end_stamp

    df$duration[nrow(df)] <- 
      difftime(
        df$end[nrow(df)], 
        df$start[nrow(df)], 
        units=c("secs")) %>%
      as.numeric()
  }


  # ————————————————————————————————————————————————————————————————————————————
  # Create new row
  # ————————————————————————————————————————————————————————————————————————————

  df. <- data.frame(
    label = label,
    start = 
      as.POSIXlt(Sys.time(), "%Y-%m-%d %H:%M:%OS3", tz = "CET") %>% 
      format(., "%Y-%m-%d %H:%M:%OS3"),
    end = as.POSIXct("", format="%Y-%m-%d"),
    duration = NaN,
    stringsAsFactors = F
  )

  df <- rbind(df, df.)


  # ————————————————————————————————————————————————————————————————————————————
  # Save to global env variable
  # ————————————————————————————————————————————————————————————————————————————

  assign(
    x = deparse(substitute(time_stamps)), 
    value = df, 
    env = .GlobalEnv)
}

The End() function:

End <- function(time_stamps, label = NULL){

  # ————————————————————————————————————————————————————————————————————————————
  # Get the global env variable
  # ————————————————————————————————————————————————————————————————————————————

  df <- get(
    x = deparse(substitute(time_stamps)),
    envir = .GlobalEnv)


  # ————————————————————————————————————————————————————————————————————————————
  # Add end stamp and duration on last instance if not done yet
  # ————————————————————————————————————————————————————————————————————————————

  end_stamp <-
    as.POSIXlt(Sys.time(), "%Y-%m-%d %H:%M:%OS3", tz = "CET") %>%
    format(., "%Y-%m-%d %H:%M:%OS3")

  df$end[nrow(df)] <- end_stamp

  df$duration[nrow(df)] <-
    difftime(
      df$end[nrow(df)],
      df$start[nrow(df)],
      units=c("secs")) %>%
    as.numeric()


  # ————————————————————————————————————————————————————————————————————————————
  # Save to global env variable
  # ————————————————————————————————————————————————————————————————————————————

  assign(
    x = deparse(substitute(time_stamps)),
    value = df,
    env = .GlobalEnv)

  if (!is.null(label)){
    printf("Ended %s", label)
  }
}

The Timestamps() function

Timestamps <- function(time_stamps, end_last = FALSE, head = NULL){

  # ————————————————————————————————————————————————————————————————————————————
  # Get the global env variable
  # ————————————————————————————————————————————————————————————————————————————

  df <- get(
    x = deparse(substitute(time_stamps)), 
    envir = .GlobalEnv)


  # ————————————————————————————————————————————————————————————————————————————
  # End last timestmap if necessary and reload the global env variable
  # ————————————————————————————————————————————————————————————————————————————

  if (nrow(df) > 0 && is.na(df$end[nrow(df)])){
    End(time_stamps, "timestamp to study summary")

    df <- get(
      x = deparse(substitute(time_stamps)), 
      envir = .GlobalEnv)
  }


  # ————————————————————————————————————————————————————————————————————————————
  # Create summary to be printed
  # ————————————————————————————————————————————————————————————————————————————

  output <- data.frame(
    label = unique(df$label),
    mean  = c(NaN),
    cv    = c(NaN),
    n     = c(NaN)
  )

  for (row in 1:nrow(output)) {
    label_in_question <- output$label[row]

    durations <- df$duration[df$label == label_in_question]

    output$mean[row] <- mean(durations)
    output$cv[row] <- cv(durations)
    output$n[row] <- sum(df$label == label_in_question)
  }

  output$total_time <- output$mean * output$n

  output <- output[order(output$mean, decreasing = TRUE),]

  if (!is.null(head)){
    print(head(output, head))
  }else{
    print(output)
  }
}

Example

for (i in 1:3) {
  Start("Running the 1st for loop", foo)
  Sys.sleep(2)
  End(foo)
}

Timestamps(foo)

However, if End() is not called before the next Start(), then the timestamping of the last instance will automatically be stopped before starting a new instance. The code above is equivalent to, and produces identical results as, the following code:

for (i in 1:3) {
  Start("Running the 1st for loop", foo)
  Sys.sleep(2)
}

Timestamps(foo, end_last = TRUE)

And finally, running Timestamps(foo) prints the following output:

   label                     mean   cv          n  total_time
1  Running the 1st for loop  2.004  0.08643715  3  6.012
Christian
  • 932
  • 1
  • 7
  • 22