1

So I'm writing a program in R using R6 (my bosses preference). It's got to do some heavy duty number crunching so I'm trying to get the key variables in the R6 classes to modify in place. Unfortunately what works for getting variables to modify in place in normal R doesn't seem to work inside an R6 class. I've constructed a minimal example below. You can clearly see variable inside the R6 class the variable jumps to a new memory address after the function. Outside the R6 class doing exactly the same thing causes no copy. Can any one give me any advice as to why and how I might get the variables in the class to modify in place?

my_r6 <- R6Class("my_r6",
  public = list(
    test = function() {
      for (i in 1:5) {
        private$x$a[i] <- 3
      }
    }
  ),
  private = list(
    x = list(a = c(1, 2, 3, 4, 5))
  )
)
temp_r6 <- my_r6$new()
tracemem(temp_r6$.__enclos_env__$private$x$a)
temp_r6$test()
y <- list(b = c(1, 2, 3, 4, 5))
tracemem(y$b)
for (i in 1:5) {
  y$b[i] <- 3
}
Peter Clark
  • 161
  • 1
  • 5
  • Why should your code care whether something is modified in place? – Hong Ooi Apr 11 '18 at 14:40
  • Because It's part of a high performance computing monti carlo simulation. modify in place is a lot faster on large data structures. If you want to know why we're trying to do high performance computing in R instead of C see previous comment about my boss. – Peter Clark Apr 12 '18 at 09:52

1 Answers1

0

Your code isn't testing what you think it is. tracemem() does not tell you whether or not an object has been modified in place. The Details section of ?tracemem says it tells you if the C function duplicate() has been called.

And that (only?) happens when more than one name/symbol refers to the same data. For example:

y <- list(b=1:5)
tracemem(y)
# [1] "<0000000009226B08>"
y[1] <- pi/2  # nothing else points to the same memory as 'y'
z <- y        # now 'z' and 'y' point to the same memory
z[1] <- pi    # this requires duplication of y
# tracemem[0x0000000009226b08 -> 0x00000000092228c8]:

Also, your code outside of the R6 class isn't the same as the code inside it. The private object is an environment, so a more accurate comparison is below. You can see it shows the same duplication as the R6 code.

e <- new.env()
e$y <- list(b = c(1, 2, 3, 4, 5))
tracemem(e$y$b)
for (i in 1:5) {
  e$y$b[i] <- 3
}
# tracemem[0x0000000008ff9c78 -> 0x000000000dac2298]:

I'm not going to spend time investigating or explaining why the duplication happens or a way to avoid it, just in case this is an XY problem. Let me know if this is very close to your actual problem and I'll try to answer with an edit.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • well I could be wrong but my understanding of tracemem is that it tracks copy operations for variables from one block of ram to another. If a variable is 'modified in place' there will be no copy, the original block of ram will just be over written. I did try using the address function from the pryr library to track this but pryr seems to treat all variables in an R6 object as if they had the same address. I really do need a way to track when R copy's atomic data types and stop it doing this in key parts of the program ... so no not an X/Y problem. – Peter Clark Apr 12 '18 at 09:47
  • @PeterClark: your understanding of `tracemem()` is incorrect. It only tracks copying that occurs via calls to `duplicate()` (as it says in the help page). It does not track other types of allocation / copying. R works really hard to maintain pass-by-value semantics, so it's generally very difficult to modify atomic types by reference from R. That said, it's easy to do via the C API, with all the caveats about using mutable data structures. – Joshua Ulrich Apr 12 '18 at 14:53
  • Unfortunately I've been asked for a solution that doesn't involve compiling any c code. Unless there is a way to call the C API directly from R that doesn't help me much? Personally I'd rather write the whole project in C but that's not an option. – Peter Clark Apr 13 '18 at 08:38
  • Your description of the actual problem seems to be an objective and constraints with an essentially unfeasible solution. If I were in your position, I would revisit the requirement of being able to modify R objects by reference. I still think this may be an XY problem, since you haven't demonstrated that you have a performance bottleneck caused by memory (re-)allocation similar to your contrived example. – Joshua Ulrich Apr 13 '18 at 11:18
  • Yeah I won't be able to demonstraight that until the code is basically finished. At which point I'll have a load of code I'll probably need to scrap. So that's basically what I'm doing. Maybe if I show them the finished program suffers from memory overflow and or very slow performance they'll let me re write in c. I'd just rather not have to finish the r code first. – Peter Clark Apr 18 '18 at 09:56