I am trying to figure out how to create a class in R with certain properties. Basically I'd have a constructor that takes a data frame as it's first argument followed by mandatory identification of certain columns or column ranges, by name(s) or column number(s). Thereafter I want my methods to be able to refer to columns as identified in the constructor, but I want the user to still be able to refer to these columns by their original names.
Start with a dataframe:
DATE <- paste0("201", c(5,5, 6, 6))
HHWEIGHT <- c(100, 200, 100,300)
HHINCOME <- c(30, 40, 50, 20)
STUFF <- data.frame(DATE, HHWEIGHT, HHINCOME)
This is a weighted mean function, grouped by date. It uses names internal to the dataframe for grouping and to create new columns.
mean_dt <- function(X){
X <- group_by(X, DATE)
X <- mutate(X, subtot = sum(HHWEIGHT))
X <- mutate(X, inc_wt = (HHWEIGHT * HHINCOME)/subtot)
out <- summarize(X, mean(inc_wt))
out
}
mean_dt(STUFF)
Applied to the dataframe above, produces the following results:
# A tibble: 2 x 2
DATE `mean(inc_wt)`
<fctr> <dbl>
1 2015 18.33333
2 2016 13.75000
My goal is that after assigning a dataframe to MyClass with the constructor, I could use the assigned names internally in class functions, as if the dataframe had been renamed by the (non-NULL -- NULLs do something else) names in the constructor. So after calling this constructor:
MyClass <- function(X, dt=year, wt=NULL, inc=income){
class(X) <- c("data.frame", "Myclass")
# seeking help here
X
}
STUFF2 <- MyClass(STUFF, dt=DATE, wt=HHWEIGHT, inc=HHINCOME)
The idea is that the method below would produce the same result as mean_dt above. Note that this is simply the function above using the names from the constructor. Note also that in its current form, the function below does not work, because R, or more specifically, dplyr, cannot find the names assigned by the constructor.
MyClass.mean_dt <- function(X){
# seeking help here
X <- group_by(X, dt)
X <- mutate(X, subtot = sum(wt))
X <- mutate(X, inc_wt = (wt * inc)/subtot)
out <- summarize(X, mean(inc_wt))
out
}
mean_dt(STUFF2)
The question is whether something can be added to the constructor and/or my method functions, ideally something of a standardized form, that would make MyClass.mean_dt and other methods work as if object X, here STUFF2, had had its columns renamed by the assignments in the constructor. I think it would be added at one or both of the places I have marked "Need help here."
This problem is operating under two constraints. The first is that I want the user to be able to manipulate the object using the variable names s/he is used to, which are more likely to be the names the columns have at the start. But I want to write my own methods based on a common set of names, not the user's idiosyncratic names.
The second is that I do not want to achieve the internal-to-method-function's goal of being able to use the designated names by copying the object or any large set of columns. My objects are large -- ten to twenty gigs (uncompressed) is not unusual -- and designated columns would typically run in the dozens, and occasionally the hundreds. My hope is nonetheless to write software that people on ordinary computers can use. So I need to keep copying to a minimum.