Dynamically create dataframe and variable names in R similar to macros in SAS and STATA

Question

I've been trying to find a solution for this for over a year already and decided to write a post about it. Any help would be appreciated. Here is the pseudocode that I can do easily in Stata and SAS but I don't know how to do in R. {} is the glue-like operator that was introduced into dplyr this year, so I'm using it as a placeholder for the operator that makes the pseudocode work.

library(tidyverse)
var <- "mpg"
df_name <- "mtcars"

{df_name} %>% count({var})

{df_name}_1 <- {df_name} %>% mutate(., {var}_1={var}/2)

length({df_name}_1)

should lead to

library(tidyverse)
var <- "mpg"
df_name <- "mtcars"

mtcars %>% count(mpg)

mtcars_1 <- mtcars %>% mutate(., mpg_1=mpg/2)

length(mtcars_1)

In Stata, I can easily do with local or global macros like this:

local df_name "mtcars"

then reference it as `df_name'

In SAS I can do it with global macros like this:

 %LET df_name=mtcars;

then reference it like &df_name.

Please note how visually easy it is to reference these values -- no assigns, gets, parentheses, mgets, etc .

Both approaches allow to use them in dataset names, functions, variables, etc. Simplifies my code tremendously and saves me tons of time. How to do this with visual simplicity in R? My code should be readable for people familiar with Stata/SAS (dplyr is awesome in this regard!) and too many evals, wrapping everything in functions, assigns with parentheses will just make them give up on the project or force me to change back to SAS/Stata.

I tried all combinations of {{}}, !!, enquo, sym, and the NSE and still don't know how to make this work in a visually simple way. In dplyr pipes, there is finally some workaround for the variable names but nothing for the dataframes and base R.

I would really appreciate any help in this matter! I had this problem back in 2009 with R and gave up on R until I had to come back in 2019 and still can't find an easy way to approach this.

Does `df_name` need to be a stored text variable? Why not just assign the name of the dataset to that object if you want to keep it in a simple script in a SAS-like fashion? `df_name <- mtcars` then you can do `df_name %>% count(!!var)` or `length(df_name[[var]])` ? — thelatemail, Oct 13 '20 at 23:46
I don't think there is anything like this in R (though maybe someone will show some package that does this well). I don't think you'll like this, but basically whenever you want to do this I think you should be using functions instead. This is essentially how function arguments work. When I have `foo <- rnorm(16)` and I do `mean(foo)`, the `mean` function internally refers to `foo` as `x`. When I do `lm(mpg ~ hp, data = mtcars)`, the `lm` function only uses `data`, the name or the argument, internally. R is a functional language and the way R accomplishes these ends is through functions. — Gregor Thomas, Oct 13 '20 at 23:53
Yeah, wrapping up everything in function will make everything very hard to work with and to debug. This is all academic research not production-ready stuff. — astrae_research, Oct 14 '20 at 00:09
Respectfully, it's a paradigm shift. Production code doesn't use functions to make things difficult to work with and debug. Production code uses functions (and preferably many small functions, not one big one) because it becomes easier to test, debug, and work with. Case in point, you're struggling with this, but it's a non-issue if you use functions. If you switch from skiing to snowboarding, you have to let go of the poles at some point too. — Gregor Thomas, Oct 14 '20 at 00:13
@thelatemail In SAS every DATA STEP can lead to a &df_name._1 new file so that the previous and the new files exist and can be compared, debugged and what not. How do I do {df_name}_1 <- {df_name} %>% mutate(., {var}_1={var}/2) using your approach? I actually do not want to have df_name named dataframe in the environment only the mtcars_* ones. Thank you for the reply! I need to be absolutely sure it can't be done though. — astrae_research, Oct 14 '20 at 00:15
@GregorThomas Can you solve this using functions please? This could be an interesting example. — astrae_research, Oct 14 '20 at 00:16
I should be working now... I'll try to find time later tonight to write up how I would do this with functions. — Gregor Thomas, Oct 14 '20 at 00:24

score 5 · Answer 1 · answered Oct 14 '20 at 02:48

R is a functional programming language. You would use meta-programming to create code rather than using macro style text replacement like SAS to create new features. Variable names in R are symbols, not strings. Trying to force these two different programming paradigms to look the same is generally not a good idea and leads to very un-idiomatic in one of the two languages.

With R it's better to track the data itself rather than the name of the data and track the column you want to use with a symbol.

library(tidyverse)

df <- mtcars
var <- rlang::sym("mpg")

Then you can just do

df %>% count(!!var)

or create a function

get_counts <- function(data, x) {
    data %>% count({{x}})
}
get_counts(df, mpg)   #use actual column name
get_counts(df, !!var) #or use name from variable with !!

You can name new variables using := and some glue style sytnax with dplyr

df %>% mutate(., "{var}_1" := !!var/2)

which you could also put into a function

create_new_var <- function(data, x) {
  data %>% mutate(., "{{x}}_1":={{x}}/2)
}
create_new_var(df, mpg)
create_new_var(df, !!var)

And then rather than creating variables with numeric indexes in their name, you would just use pipes

df %>% 
   create_new_var(!!var) %>%
   length()

Or if you have multiple values to work with, you keep values in named lists. And then you can map functions over the lists. For eample

df_name <- "mtcars"
data <- mget(df_name, inherits = TRUE)
fixed <- map(data, ~create_new_var(., !!var))
lens <- map(fixed, ~length(.))
lens$mtcars
# [1] 12
lens[[df_name]]
# [1] 12

This would also work if you had multiple data.frames stored in the data list

I would strongly discourage you from using something like this, but you could define a new type of assignment that would allow for string-line variable names. For example

`%<-%` <-  function(x, value) {
  varname <- glue::glue(x, .envir = parent.frame())
  invisible(assign(varname, value, envir = parent.frame()))
}

df_name <- "mtcars"
"{df_name}_1" %<-% { get(df_name) %>% create_new_var(!!var) }

Here we defined %<-% rather than <- that will take string-like names and expand them and turn them into new variable. And we use get() to get values for variables with strings. Note that we need {} around the expression we want to assign to the new value because we can't control the precedence of our new operator. It has the same precedence as the pipe operator so everything would go left to right by default. This is not something a proper R programming would likely do though.

This last block of code is the only one that solves the dynamic LHS dataname assignment! This is very un-R and I understand the implications but this is the first time when the answer actually hit the bulls-eye! Thank you! What is the fundamental reason we can't use {} in df %>% mutate(., "{var}_1" := !!var/2) but for the LHS of the `<-` like you did in the last block? — astrae_research, Oct 15 '20 at 05:40

IRTFM · Answer 2 · 2020-10-14T02:40:59.580

I do not see that needing to use get in R in place of needing to use LET in SA is any more difficult. The dplyr mechanism for turning named character values into R names or "language"-expressions is the !! operator, so this seems to be an exact implementation of the first part of your request:

 library(tidyverse)
 var <- "mpg"
 df_name <- get("mtcars")  # this will create a separate copy
 
 
 df_name %>% count(!!var)
#  "mpg"  n
#1   mpg 32

You will need to use an assign operation for the next parts, though. I don't think dplyr has gone the route of trying to emulate the macro character of SAS which most experienced R users would consider as an abomination. There was an effort about 10 years ago on the part of one of the R-greats to illustrate how R could do macro-processing, but I think he did it more as to show why it should not be done than with any intent to make R more SAS-like. (I'll now go out and look for the article.)

Here's the document. It is closer to 20 years ago. Go to page 11.

The variable part thank to bang bang which was introduced very recently now works. The dataframe part is still causing trouble. So only assign works with the datframe name? — astrae_research, Oct 14 '20 at 04:10

Dynamically create dataframe and variable names in R similar to macros in SAS and STATA

2 Answers2