I am working with data frames of a certain structure. For the sake of example, they are all expected to have an identical set of columns and unique values of the "id" column. There is also a set of S3 generics and methods for this class of data frames, like "plot", "write" and so on.
However, if the user modifies the structure of a data frame, then semantically it no longer belongs to the initial class, although the object may still formally have the class label. Hence none of the S3 methods can be safely applied to the object.
Is it possible to somehow check the validity of the data frame each time it is modified, and if the structure changes, remove the class label? What's the correct way to modify the data frame (or for that matter any object) and preserve its structure?
UPD: By modification I mean not only the base-R [
setters, but also the result of applying a tidyverse pipeline, e.g. select
'ing, mutate
'ing, ... -- these verbs do not modify the original object, but rather return a modified one instead, copying the original attributes.
One option I can think of is to wrap the data frame in an R6 class:
ZDf <- R6::R6Class(
'ZDf',
private = list(
.Data = NULL
),
public = list(
initialize = function(z) {
# Do checks here
# This duplicates set(), but in general may implement completely
# different logic
private$.Data <- z
invisible(private$.Data)
},
set = function(z) {
# Do checks here
private$.Data <- z
invisible(private$.Data)
},
get = function() {
private$.Data
}
)
)
And use it like so:
z <- ZDf$new(mtcars[1:10,])
z$get() %>%
group_by(cyl) %>%
summarise(mean(mpg)) %>%
z$set()
I read that S4 has some support for S3 classes, and an S4-wrapped object looks almost like a vanilla S4 object:
> setClass('XDf', contains = 'data.frame')
> x <- new('XDf', mtcars[1:3,1:3])
> x
Object of class "XDf"
mpg cyl disp
Mazda RX4 21.0 6 160
Mazda RX4 Wag 21.0 6 160
Datsun 710 22.8 4 108
> x[1,]
mpg cyl disp
Mazda RX4 21 6 160
However, it is impossible to use it in hadleyverse pipelines:
> x %>% slice(1)
Error in `stop_vctrs()`:
! Input must be a vector, not a <XDf> object.
Run `rlang::last_error()` to see where the error occurred.