2

Let's say we have an arbitrary size large nested list (the level of depth may exceed 100). The list contains objects (up to several thousand) in a non-predefined location in the list that we will need to see and modify often. Therefore, we keep a separate variable with pointers to these objects in the list. We need to know what would be the fastest way on how to create the pointers.

So far, I could think of 4 different solutions with the code below:

First, we need to have a dummy nested list object for the sake of demonstration:

create_nested_list <- function(depth) {
     myList = list()
     if(depth > 0) {
    depth <- depth - 1
    myList$level <- paste0(paste0('depth_', depth))
    myList[[paste0('Depth_', depth, '_A')]] <- create_nested_list(depth)
    myList[[paste0('Depth_', depth, '_B')]] <- create_nested_list(depth)   }
     return(myList) }

myList <- create_nested_list(10)

Then, let's say we will want to see and often modify the following attribute in the list:

myList$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level
  1. The expression above would be a direct method to access the element in the list. However, it doesn't work for our case as the code above would create a copy of the object instead of a pointer.

  2. The Base-R solution would be saving a path to the object in a string and evaluating the expression.

path <- '$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level'
eval(str2lang(paste0('myList', path))) 
  1. We can also use the library "pointr" to create the pointer object.
library(pointr)
ptr('pointer_to_the_object', 'myList$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level')
pointer_to_the_object
  1. Instead of using the S3 class object, we can use the R6/Reference class. But in that case each element in the list must be a separate S6 class object. We need to change the way how we create the base list.

     library(R6)
     nestedR6 <- R6Class(
       'myList',
       cloneable = FALSE,
       lock_objects = FALSE,
       public = list(
         ref_list = NULL,
         initialize = function(depth) {
    
           if(depth > 0) {
             depth <- depth - 1
             self$level <- paste0(paste0('depth_', depth))
             self[[paste0('Depth_', depth, '_A')]] <- nestedR6$new(depth)
             self[[paste0('Depth_', depth, '_B')]] <- nestedR6$new(depth)
           }
         }
       )
     )
    
myListR6 <- nestedR6$new(10)
R6obj <- myListR6$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A

Then, we can compare the speed of all 4 methods:

library(microbenchmark)
library(ggplot2)

mbm <- microbenchmark(direct = myList$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level,
               direct2 = myList[['Depth_9_B']][['Depth_8_A']][['Depth_7_A']][['Depth_6_A']][['Depth_5_A']][['Depth_4_A']][['Depth_3_A']][['Depth_2_A']][['level']],
               eval_expression = {
                 eval(str2lang(paste0('myList', path)))
               }, 
               pointer = pointer_to_the_object,
               R6_Class = R6obj[['level']],
               times = 100)
autoplot(mbm)

enter image description here

Surprisingly, the access via the pointer object is the slowest one, and R6 class pointer is working even faster than direct access. Unfortunately, the R6 class is not the optimal solution as creating a nested list via R6 objects is significantly slower than S3.

microbenchmark(
  S3 = create_nested_list(10),
  S6 = nestedR6$new(10)
)

enter image description here

LordRudolf
  • 63
  • 8

1 Answers1

0

I've found a solution.

Instead of using lists, we can use environment objects (which, basically, are featureless R6 classes).

## Create the environment object (almost the same as list)
create_nested_environment <- function(depth) {
  
  myList = new.env()
  
  if(depth > 0) {
    depth <- depth - 1
    myList$level <- paste0(paste0('depth_', depth))
    myList[[paste0('Depth_', depth, '_A')]] <- create_nested_environment(depth)
    myList[[paste0('Depth_', depth, '_B')]] <- create_nested_environment(depth)
  }
  
  return(myList)
}

myList <- create_nested_list(10)

myEnv <- create_nested_environment(10)
e <- myEnv$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A

Now we can test the performance of all the methods:

mbm <- microbenchmark(direct = myList$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level,
                      direct2 = myList[['Depth_9_B']][['Depth_8_A']][['Depth_7_A']][['Depth_6_A']][['Depth_5_A']][['Depth_4_A']][['Depth_3_A']][['Depth_2_A']][['level']],
                      eval_expression = {
                        eval(str2lang(paste0('myList', path)))
                      }, 
                      pointer = pointer_to_the_object,
                      R6_Class = R6obj[['level']],
                      env = e[['level']],
                      env_direct = myEnv$Depth_9_B$Depth_8_A$Depth_7_A$Depth_6_A$Depth_5_A$Depth_4_A$Depth_3_A$Depth_2_A$level,
                      times = 1000)
autoplot(mbm)

The pointers to the environment objects are even faster than direct links in lists or pointers in R6 classes [1

And the speed comparison between lists, R6 classes and raw environment objects:

mbm <- microbenchmark(
  S3 = create_nested_list(10),
  R6 = nestedR6$new(10),
  env = create_nested_environment(10)
)
autopilot(mbm)

The same performance as lists.

enter image description here

LordRudolf
  • 63
  • 8