I want to create a subclass for data.frame
(and the like). I want a function
that creates the class information and another that removes it.
df <- data.frame(x = 1:10)
This would be the “R sanctioned” way of adding classes, I think. And adding and then removing the class works as expected, returning the untouched object.
df2 <- df
class(df2$x) <- c("someclass", class(df2$x))
class(df2$x) <- class(df2$x)[class(df2$x) != "someclass"]
all.equal(df2, df)
#> [1] TRUE
But this method doesn’t work if I want to change the object by reference
because class<-
seems to make a copy of the data.table:
add.class <- function(object) {
class(object$x) <- c("someclass", class(object$x))
return(invisible(object))
}
add.class(df)
class(df$x) # doesn't work
#> [1] "integer"
The data.table
documentation says the correct way of seting attributes
by reference is using setattr
and, indeed, this function does work as
intended
add.class <- function(object) {
data.table::setattr(object$x, "class", c("someclass", class(object$x)))
return(invisible(object))
}
add.class(df)
class(df$x) # works!
#> [1] "someclass" "integer"
But the problem is that when using the same logic to remove the class the object does not go back exactly to the previous state
remove.class <- function(object) {
data.table::setattr(object$x, "class", class(object$x)[class(object$x) != "someclass"])
return(invisible(object))
}
df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
add.class(df2)
remove.class(df2)
all.equal(df, df2) # Not equal!
#> [1] "Component \"x\": Attributes: < target is NULL, current is list >"
#> [2] "Component \"x\": target is numeric, current is integer"
all.equal(class(df$x), class(df2$x)) # But, equal classes??
#> [1] TRUE
The problem is not specific to data.table::setattr()
, since the regular
attr<-
function has the same issue when changing the class
attribute
df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
attr(df$x, "class") <- c("someclass", class(df$x))
attr(df$x, "class") <- class(df$x)[class(df$x) != "someclass"]
all.equal(df, df2)
#> [1] "Component \"x\": Attributes: < Modes: list, NULL >"
#> [2] "Component \"x\": Attributes: < Lengths: 1, 0 >"
#> [3] "Component \"x\": Attributes: < names for target but not for current >"
#> [4] "Component \"x\": Attributes: < current is not list-like >"
#> [5] "Component \"x\": target is integer, current is numeric"
My guess was that class<-
does something different than attr<-
and
setattr()
. But then I realised that it only seems to happen if I change
the class of a column. Changing the class of the entire object works as
expected.
add.class <- function(object) {
data.table::setattr(object, "class", c("someclass", class(object)))
return(invisible(object))
}
remove.class <- function(object) {
data.table::setattr(object, "class", class(object)[class(object) != "someclass"])
return(invisible(object))
}
df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
add.class(df2)
remove.class(df2)
all.equal(df, df2)
#> [1] TRUE
This is not such a huge problem but it really makes it a pain in the ass
to properly test, since I can’t use testthat::expect_equal()
for unit
tests.
So, what’s the proper way of changing the class of a data.frame
column by
reference?
Created on 2018-08-29 by the reprex package (v0.2.0).