0

I want to create a subclass for data.frame (and the like). I want a function that creates the class information and another that removes it.

df <- data.frame(x = 1:10)

This would be the “R sanctioned” way of adding classes, I think. And adding and then removing the class works as expected, returning the untouched object.

df2 <- df
class(df2$x) <- c("someclass", class(df2$x))
class(df2$x) <- class(df2$x)[class(df2$x) != "someclass"]
all.equal(df2, df)
#> [1] TRUE

But this method doesn’t work if I want to change the object by reference because class<- seems to make a copy of the data.table:

add.class <- function(object) {
  class(object$x) <- c("someclass", class(object$x))
  return(invisible(object))
}
add.class(df)
class(df$x)  # doesn't work
#> [1] "integer"

The data.table documentation says the correct way of seting attributes by reference is using setattr and, indeed, this function does work as intended

add.class <- function(object) {
  data.table::setattr(object$x, "class", c("someclass", class(object$x)))
  return(invisible(object))
}
add.class(df)
class(df$x)  # works!
#> [1] "someclass" "integer"

But the problem is that when using the same logic to remove the class the object does not go back exactly to the previous state

remove.class <- function(object) {
  data.table::setattr(object$x, "class", class(object$x)[class(object$x) != "someclass"])
  return(invisible(object))
}

df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
add.class(df2)
remove.class(df2)
all.equal(df, df2)  # Not equal!
#> [1] "Component \"x\": Attributes: < target is NULL, current is list >"
#> [2] "Component \"x\": target is numeric, current is integer"
all.equal(class(df$x), class(df2$x))  # But, equal classes??
#> [1] TRUE

The problem is not specific to data.table::setattr(), since the regular attr<- function has the same issue when changing the class attribute

df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
attr(df$x, "class") <- c("someclass", class(df$x))
attr(df$x, "class") <- class(df$x)[class(df$x) != "someclass"]
all.equal(df, df2) 
#> [1] "Component \"x\": Attributes: < Modes: list, NULL >"                   
#> [2] "Component \"x\": Attributes: < Lengths: 1, 0 >"                       
#> [3] "Component \"x\": Attributes: < names for target but not for current >"
#> [4] "Component \"x\": Attributes: < current is not list-like >"            
#> [5] "Component \"x\": target is integer, current is numeric"

My guess was that class<- does something different than attr<- and setattr(). But then I realised that it only seems to happen if I change the class of a column. Changing the class of the entire object works as expected.

add.class <- function(object) {
  data.table::setattr(object, "class", c("someclass", class(object)))
  return(invisible(object))
}

remove.class <- function(object) {
  data.table::setattr(object, "class", class(object)[class(object) != "someclass"])
  return(invisible(object))
}

df <- data.frame(x = 1:10)
df2 <- data.table::copy(df)
add.class(df2)
remove.class(df2)
all.equal(df, df2) 
#> [1] TRUE

This is not such a huge problem but it really makes it a pain in the ass to properly test, since I can’t use testthat::expect_equal() for unit tests.

So, what’s the proper way of changing the class of a data.frame column by reference? Created on 2018-08-29 by the reprex package (v0.2.0).

Elio Campitelli
  • 1,408
  • 1
  • 10
  • 20
  • 1
    I feel like the right approach is to know what class "someclass" is a subclass of and respond appropriately, in this case with `setattr(df2$x, "class", NULL)`, though I'm not really sure (never tried messing with this sort of thing) – Frank Aug 29 '18 at 20:33
  • You keep switching between `data.table` and `data.frame` so I can;t tell if your problem is specifically for one or the other (they are very different). Normally objects in R are pass-by-value and functions need to return updated valued rather than changing data directly inside them. So with your first function you need to save the result `df3 <- add.class(df)` and use the returned value `class(df3$x)`. – MrFlick Aug 29 '18 at 20:58
  • Is all `data.frame`s. From `data.table` I just use some functions (`copy()` and `setattr()`). I've tested this with `data.frames` (as shown), with `data.table`s and with `tibble`s. The result is always the same. I know that usually functions return values, but that's not what i'm trying to do in this case. – Elio Campitelli Aug 29 '18 at 22:08

0 Answers0