4

I'm implementing an S4 class that contains a data.table, and attempting to implement [ subsetting of the object (as described here) such that it also subsets the data.table. For example (defining just i subsetting):

library(data.table)

.SuperDataTable <- setClass("SuperDataTable", representation(dt="data.table"))

setMethod("[", c("SuperDataTable", "ANY", "missing", "ANY"),
    function(x, i, j, ..., drop=TRUE)
{
    initialize(x, dt=x@dt[i])
})

d = data.table(a=1:4, b=rep(c("x", "y"), each=2))
s = new("SuperDataTable", dt=d)

At this point, subsetting with a numeric vector (s[1:2]) works as desired (it subsets the data.table in the slot). However, I'd like to add the ability to subset using an expression. This works for the data.table itself:

s@dt[b == "x"]
#    a b
# 1: 1 x
# 2: 2 x

But not for the S4 [ method:

s[b == "x"]
# Error: object 'b' not found

The problem appears to be that arguments in the signature of the S4 method are not evaluated using R's traditional lazy evaluation- see here:

All arguments in the signature of the generic function will be evaluated when the function is called, rather than using the traditional lazy evaluation rules of S. Therefore, it's important to exclude from the signature any arguments that need to be dealt with symbolically (such as the first argument to function substitute).

This explains why it doesn't work, but not how one can implement this kind of subsetting, since i and j are included in the signature of the generic. Is there any way to have the i argument not be evaluated immediately?

Community
  • 1
  • 1
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • I guess I'm confused. I normally think of an initializations step as part of a `new` operation rather than part of an extraction operation. – IRTFM Mar 23 '14 at 18:00
  • @IShouldBuyABoat The use of `initialization` comes from [this suggestion](http://stackoverflow.com/questions/10961842/how-to-define-the-subset-operators-for-a-s4-class) (also linked to above), which works because `initialization` is a copy constructor. If you wanted to replace it with `x@dt = x@dt[i]; return(x)` you could: it has nothing to do with the issue above. – David Robinson Mar 23 '14 at 18:37
  • 1
    How could R figure out what method to call if it doesn't know the class of the arguments? – hadley Mar 23 '14 at 23:19
  • @hadley: One could imagine it working if the signature for that argument were `ANY`, and therefore that it didn't need to know the class of the argument (though I understand that it doesn't). – David Robinson Mar 23 '14 at 23:51

1 Answers1

1

You may be out of luck on this one. From the R developer notes,

Arguments appearing in the signature of the generic will be evaluated as soon as the generic function is called; therefore, any arguments that need to take advantage of lazy evaluation must not be in the signature. These are typically arguments treated literally, often via the substitute() function. For example, if one wanted to turn substitute() itself into a generic, the first argument, expr, would not be in the signature since it must not be evaluated but rather treated as a literal.

Furthermore, due to method caching,

All the arguments in the full signature are evaluated as described above, not just the active ones. Otherwise, in special circumstances the behavior of the function could change for one method when another method was cached, definitely undesirable.

I would follow the example from the data.table package writers and use an S3 object (see line 304 of R/data.table.R in their source code). Your S3 object can still create and manipulate an S4 object underneath to maintain the semi-static typing feature.

We can't get extraordinarily clever:

 ‘[’ is a primitive function;  methods can be defined, but the generic function is implicit, and cannot be changed.

Defining both an S3 and S4 method will dispatch the S3 method, which makes it seem like we should be able to route around the S4 call and dispatch it manually, but unfortunately the argument evaluation still occurs! You can get close by borrowing plyr::., which would give you syntax like:

s <- new('SuperDataTable', dt = as.data.table(iris))
s[.(Sepal.Length > 4), 2]

Not ideal, but closer than anything else.

Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • Thanks for your detailed answer- could you elaborate on "create and manipulate an S4 object underneath"? Do you mean have the S3 object have a member that is an S4 object (like `s$myS4obj`?) The main reason I'm trying to get it to work in S4 is due to [Bioconductor's package standards](http://www.bioconductor.org/developers/package-submission/). – David Robinson Mar 23 '14 at 23:49
  • @DavidRobinson You can implement standard S3 interfaces, but either keep the S4 object in a list on the S3 object or stick it in an attribute. Make sure to write a `print` method so your users don't see the deep the internals. I am not sure about Bioconductor's package standards, but like I said, if you really really need a pure S4 object you can use the `.()` trick like above--as far as my research shows, no other solution will give you what you want without trying to write C code that messes with R internals. – Robert Krzyzanowski Mar 24 '14 at 15:05