1

I am trying to figure out how to create a class in R with certain properties. Basically I'd have a constructor that takes a data frame as it's first argument followed by mandatory identification of certain columns or column ranges, by name(s) or column number(s). Thereafter I want my methods to be able to refer to columns as identified in the constructor, but I want the user to still be able to refer to these columns by their original names.

Start with a dataframe:

DATE <- paste0("201", c(5,5, 6, 6))
HHWEIGHT <- c(100, 200, 100,300)
HHINCOME <-  c(30, 40, 50, 20)

STUFF <- data.frame(DATE, HHWEIGHT, HHINCOME)

This is a weighted mean function, grouped by date. It uses names internal to the dataframe for grouping and to create new columns.

mean_dt <- function(X){
  X <- group_by(X, DATE)
  X <- mutate(X, subtot = sum(HHWEIGHT))
  X <- mutate(X, inc_wt = (HHWEIGHT * HHINCOME)/subtot)
  out <- summarize(X, mean(inc_wt))
  out
}

mean_dt(STUFF)

Applied to the dataframe above, produces the following results:

# A tibble: 2 x 2
    DATE `mean(inc_wt)`
  <fctr>          <dbl>
1   2015       18.33333
2   2016       13.75000

My goal is that after assigning a dataframe to MyClass with the constructor, I could use the assigned names internally in class functions, as if the dataframe had been renamed by the (non-NULL -- NULLs do something else) names in the constructor. So after calling this constructor:

MyClass <- function(X, dt=year, wt=NULL, inc=income){
  class(X) <- c("data.frame", "Myclass")
  # seeking help here
  X
}

STUFF2 <- MyClass(STUFF, dt=DATE, wt=HHWEIGHT, inc=HHINCOME)

The idea is that the method below would produce the same result as mean_dt above. Note that this is simply the function above using the names from the constructor. Note also that in its current form, the function below does not work, because R, or more specifically, dplyr, cannot find the names assigned by the constructor.

MyClass.mean_dt <- function(X){
  # seeking help here
  X <- group_by(X, dt)
  X <- mutate(X, subtot = sum(wt))
  X <- mutate(X, inc_wt = (wt * inc)/subtot)
  out <- summarize(X, mean(inc_wt))
  out
}

mean_dt(STUFF2)

The question is whether something can be added to the constructor and/or my method functions, ideally something of a standardized form, that would make MyClass.mean_dt and other methods work as if object X, here STUFF2, had had its columns renamed by the assignments in the constructor. I think it would be added at one or both of the places I have marked "Need help here."

This problem is operating under two constraints. The first is that I want the user to be able to manipulate the object using the variable names s/he is used to, which are more likely to be the names the columns have at the start. But I want to write my own methods based on a common set of names, not the user's idiosyncratic names.

The second is that I do not want to achieve the internal-to-method-function's goal of being able to use the designated names by copying the object or any large set of columns. My objects are large -- ten to twenty gigs (uncompressed) is not unusual -- and designated columns would typically run in the dozens, and occasionally the hundreds. My hope is nonetheless to write software that people on ordinary computers can use. So I need to keep copying to a minimum.

andrewH
  • 2,281
  • 2
  • 22
  • 32
  • `year=year` and `income=income` are really bad ideas, you are likely to have recursive definitions of promises already under evaluation. And `mean.MyClass` should be defined as you define any normal *function*, You are not doing that. You are not defining a function. – Rui Barradas Sep 24 '17 at 18:26
  • Rui Barradas , So, like `mean.Myclass <- function(X){out <- X %>% group_by(year) %>% mutate(wted = wt * income) %>% summarize(mean(wted)) }` ? Or are you saying not to use the pipe construction? I'm still getting used to pipes, not sure of their limits. – andrewH Sep 24 '17 at 20:37
  • It **must** be `mean.MyClass`, with uppercase `C`. As for the pipe operator, I'm getting an error, `Error in grouped_df_impl(data, unname(vars), drop) : Column `year` is unknown`. – Rui Barradas Sep 24 '17 at 20:57

1 Answers1

0

The following is a mean method for objects of class MyClass.

mean.MyClass <- function(X, year, wt, income){
    out <- lapply(split(X, X[[year]]), function(x){
            x[["wted"]] <- x[[wt]] * x[[income]]
            x[[paste("mean", wt, sep = ".")]] <- mean(x[["wted"]])
            x
        })
    out <- do.call(rbind, out)
    row.names(out) <- NULL
    out
}

MyClass <- function(X){
    class(X) <- c("MyClass", class(X))
    X
}

STUFF2 <- MyClass(STUFF)

mean(STUFF2, "DATE", "HHWEIGHT", "HHINCOME")
#  DATE HHWEIGHT HHINCOME wted mean.HHWEIGHT
#1 2015      100       30 3000          5500
#2 2015      200       40 8000          5500
#3 2016      100       50 5000          5500
#4 2016      300       20 6000          5500

Note that I am not saying that you shouldn't use the pipe operator. But it seems to be trickier to have it recognize year as a column of X than doing it using base R only.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Rui Barradas, I have changed the names of some of my variables, but applying substitutions I hope consistently (dt for year, inc_wt for wted) I got back NA and the error In mean.default(STUFF2) : argument is not numeric or logical: returning NA. Hope that is not due to a typo on my part, but I don't think so. More generally, I am looking for an approach to let any method for my class use the specified names. Unless I am misunderstanding your code, I think it is not as general as the solution I am looking for. – andrewH Sep 25 '17 at 03:24
  • @andrewH You are getting that error because sometimes you are writing `MyClass` and others `Myclass`. `R` is case-sensitive. I will edit my answer to have a function `MyClass` like in your question. – Rui Barradas Sep 25 '17 at 10:38
  • You sure? I thought I fixed all of those. Tried to anyway. Both in my revised question above, and in the test code I ran. – andrewH Sep 25 '17 at 17:41
  • @andrewH Yes, I'm sure, in your function `MyClass` you use `c("data.frame", "Myclass")`. (BTW, it should be the other way around, see my function `MyClass`.) Note also that `MyClass.mean` doesn't define a method for generic function `mean`, the way to do it is `mean.MyClass`. – Rui Barradas Sep 25 '17 at 17:51