3

I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,

testClass <- setRefClass('testClass',
  methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')

And the R console cheerily prints NULL. However, if we try another approach,

testClass <- setRefClass('testClass',
  methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
                 print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')

and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say

testClass <- setRefClass('testClass',
  methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
                 print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)

we would be slapped with

Error in attr(.self, name) <<- value :
 cannot change value of locked binding for '.self'

Indeed, the documentation ?setRefClass explains that

The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to modify these references), with one exception. In principal, the .self field can be modified in the $initialize method, because the object is still being created at this stage.

I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!

Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?

I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.

A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.

Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • Why are you trying to set attributes on a reference class object instead of using fields??? – hadley Apr 01 '14 at 00:07
  • That is a great question! The reason is that I am assuming I do not own the reference class object -- it is a black box. Specifically, this problem arose when I was trying to write a duck typed tree structure that used arbitrary R objects as nodes (i.e. OOP-type agnostic). Now, I could write a wrapper class around it, like the XML package writes a wrapper around XML nodes, but I hate how convoluted and nested that sort of things gets. Besides, R *already* has a canonical way of specifying meta-data: attributes. I found that while I did not need them, being able to specify attributes on – Robert Krzyzanowski Apr 01 '14 at 01:03
  • the "nodes" that composed my tree allowed for some nice optimizations. Whatever the case, if one is given a black box, like a reference class object, that one did not write and should not be allowed to modify, one should *still* be able to attach meta data to it in the form of attributes. If I *did* own the object, I agree with you that fields are the appropriate mechanism. Finally, you may ask: if I do not own the object, why am I worried about referencing attributes on `.self` within the object? Methods on a reference class objects may accept functions as arguments (like blocks in Ruby) – Robert Krzyzanowski Apr 01 '14 at 01:06
  • and those functions can be executed within the environment of the reference class object, and thus are able to access `.self`. This is not specifically the problem I ran into, but it illustrates a theoretical use case. In my case, I subclassed a reference class object and wondered if I could access attributes on `.self` within a method for the aforementioned optimization purposes, arguing that the metadata attached to the object was not and should not be a field, because it encoded information about a totally different wrapping meta-structure, but ultimately I chose another, saner approach. – Robert Krzyzanowski Apr 01 '14 at 01:09
  • I would _strongly_ advise against attaching attributes to object with reference semantics. It is not true that you can attach attributes to all R objects - it's not recommended to use attributes with environments and other objects with reference semantics. – hadley Apr 01 '14 at 14:53

1 Answers1

2

I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.

 bodies <- Filter(function(x) !is.na(x),
   structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
     fn <- get(x, envir = getNamespace('methods'))
     if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
   }), .Names = ls(getNamespace('methods'), all.names = TRUE))
 )

Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:

goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass"  ".makeDefaultBinding"  ".shallowCopy"

So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object@.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives

$.xData
<environment: 0x10ae21470>

$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"

Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have

identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE

Yes! They are equal. Now, if we perform

attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE

so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that

identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE

we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.

The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.

Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.

Edit

It appears that if you are crazy, you can do

unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))

and the problems above magically go away.

Robert Krzyzanowski
  • 9,294
  • 28
  • 24