2

I'm writing a function where I'd like to be able to pass in variables from a data frame as atomic vectors, like df$var (e.g., mtcars$mpg).

To keep the example very simple, say the function just returns data.frame(table(df$var)):

foo.function <- function(var) {
  data.frame(table(var))
}

head(foo.function(mtcars$mpg))
#>    var Freq
#> 1 10.4    2
#> 2 13.3    1
#> 3 14.3    1
#> 4 14.7    1
#> 5   15    1
#> 6 15.2    2

Notice that the name of the tabulated variable in the returned table is the internal name of the passed object (var) rather than it's "original" name, which was mpg. Is it possible to retrieve mpg (just the name) from within the function (without changing or adding arguments)? I was inclined to say no, since R is just receiving a vector of values, but I suspect R may have this capacity based on what it can do with NSE.

lost
  • 1,483
  • 1
  • 11
  • 19

1 Answers1

4

We can use deparse/substitute to extract the column name

foo.function <- function(var) {
   print(sub(".*\\$", "", deparse(substitute(var))))
   data.frame(table(var))
  }

head(foo.function(mtcars$mpg), 4)
#[1] "mpg"
#   var Freq
#1 10.4    2
#2 13.3    1
#3 14.3    1
#4 14.7    1

If we need to change the column name

foo.function <- function(var) {
  nm1 <- sub(".*\\$", "", deparse(substitute(var)))
  out <- data.frame(table(var))
  names(out)[1] <- nm1
  out
 }

head(foo.function(mtcars$mpg), 4)
#  mpg Freq
#1 10.4    2
#2 13.3    1
#3 14.3    1
#4 14.7    1

As @RonakShah noted in the comments, it is better to pass column names and data as separate arguments. If the limitation of the function is to pass only a single argument and it always have to be with $, then the above function would be able to retrieve the column name

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! I thought this was possible somehow. Does `deparse(substitute())` have a `tidy` equivalent? I thought I read somewhere that there was a similar function in `rlang` or `lazyeval` but I might be mistaken. – lost Dec 19 '18 at 06:58
  • I agree it's better to pass as two arguments. I'm writing the function to be able to accept both forms `function(df, var)` and `function(df$var)` so I can use both forms in the future while not having to update old code. – lost Dec 19 '18 at 07:02
  • 1
    @lost If you are passing as a string column name, then use `sym` to convert to `symbol` and then evaluate with `!!` or if it is unquoted, then convert to quosure with `enquo` and evaluate with `!!`. I assume that you would be using tidyverse way i.e. `count` – akrun Dec 19 '18 at 07:11
  • @lost I meant something like `foo2 <- function(data, var) { var <- enquo(var); data %>% count(!! var); }; foo2(mtcars, mpg)` – akrun Dec 19 '18 at 07:12
  • 1
    Yeah, I'd use something like that, I think. I was trying to look at the code for verbs like `dplyr::filter` for inspiration to see how they handle the variable name input. – lost Dec 19 '18 at 07:18
  • what would be the base R equivalent of that? i.e. with `table` instead of `count`? – lost Feb 02 '19 at 06:01
  • @lost Isn't the first option `base R` – akrun Feb 02 '19 at 06:04
  • I meant the base R equivalent of `enquo(var); data %>% count(!! var)` if I was writing the function as `function(data, var)` with `var` unquoted. We can't do `nm <- deparse(substitute(var)); table(data$nm);` – lost Feb 04 '19 at 02:40
  • 1
    @lost try with `table(data[[nm]])` – akrun Feb 04 '19 at 06:22
  • 1
    @lost I meant `f1 <- function(dat, var) {var1 <- deparse(substitute(var)); table(dat[[var1]])}` – akrun Feb 04 '19 at 06:56