0

I'm trying to write a function to automate the creation of some new variables using tidyverse tools. I figured out my problem involves tidyeval, but I haven't quite figured out where I went wrong in the code below, which is just reproducing the variable name. As a second step, I'd like to do something besides a for loop to apply the function a bunch of times. I've read enough StackOverflow answers shaming for loops, but I can't find a worked example for using some kind of apply function creating new variables on an existing dataframe. Thanks!

library(tidyverse)
x = c(0,1,2,3,4)
y = c(0,2,4,5,8)
df <- data.frame(x,y)
df
simple_func <- function(x) {
  var_name <- paste0("pre_", x, "_months")
  var_name <-  enquo(var_name)
  df <- df %>%
    mutate(!! var_name := ifelse(x==y,1,0)) %>%
    mutate(!! var_name := replace_na(!! var_name))
  return(df)
}
simple_func(1)
#Desired result
temp <- data.frame("pre_1_months" = c(1,0,0,0,0))
temp
bind_cols(df,temp)

#Step 2, use some kind of apply function rather than a loop to apply this function sequentially
nums <- seq(1:10)
for (i in seq_along(nums)) {
  df <- simple_func(nums[i])
}
df
Jack Landry
  • 138
  • 8

2 Answers2

1

As it is a string, we can use sym to convert to symbol and then evaluate (!!

simple_func <- function(x) {
    var_name <- paste0("pre_", x, "_months")
    var_name <-  rlang::sym(var_name)
    df %>%
      mutate(!! var_name := ifelse(x==y,1,0)) %>%
      mutate(!! var_name := replace_na(!! var_name))

    }

checking with OP's code

nums <- seq(1:10)
for (i in seq_along(nums)) {
   df <- simple_func(nums[i])
 }

df
#  x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months pre_6_months pre_7_months pre_8_months
#1 0 0            1            1            1            1            1            1            1            1
#2 1 2            0            0            0            0            0            0            0            0
#3 2 4            0            0            0            0            0            0            0            0
#4 3 5            0            0            0            0            0            0            0            0
#5 4 8            0            0            0            0            0            0            0            0
#  pre_9_months pre_10_months
#1            1             1
#2            0             0
#3            0             0
#4            0             0
#5            0             0

We could use map and change the mutate to transmute

simple_func <- function(x) {
    var_name <- paste0("pre_", x, "_months")
    var_name <-  rlang::sym(var_name)
    df %>%
      transmute(!! var_name := ifelse(x==y,1,0)) %>%
      transmute(!! var_name := replace_na(!! var_name))

    }

library(purrr)
library(dplyr)
map_dfc(1:10, simple_func) %>% 
       bind_cols(df,.)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

To build on @akrun's answer, the more idiomatic way to do this would be to pass df as the first parameter of your function, and have x as the second. You can vectorize the function by putting the loop inside it to run once for each element in x by using rlang::syms instead of sym. It also makes the code shorter, and you can add it into the pipe as if it was a dplyr function.

simple_func <- function(df, x) 
{
    for(var_name in rlang::syms(paste0("pre_", x, "_months")))
    {
      df <- mutate(df, !! var_name := replace_na(ifelse(x==y,1,0)))
    }
    df
}

So now you can do:

df %>% simple_fun(1:5)
#>   x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months
#> 1 0 0            1            1            1            1            1
#> 2 1 2            0            0            0            0            0
#> 3 2 4            0            0            0            0            0
#> 4 3 5            0            0            0            0            0
#> 5 4 8            0            0            0            0            0

EDIT

Following the comment from Lionel Henry, and also from noting the OPs desire to avoid loops, here is a single function without loops that can be used in the pipe with x of an arbitrary length, and which doesn't rely on converting to symbols:

simple_func <- function(df, x) {
  f <- function(v) df <<- mutate(df, !!v := replace_na(ifelse(x == y, 1, 0)))
  lapply(paste0("pre_", x, "_months"), f)
  return(df)
}

This works the same way:

df %>% simple_fun(1:10)
#>   x y pre_1_months pre_2_months pre_3_months pre_4_months pre_5_months pre_6_months
#> 1 0 0            1            1            1            1            1            1
#> 2 1 2            0            0            0            0            0            0
#> 3 2 4            0            0            0            0            0            0
#> 4 3 5            0            0            0            0            0            0
#> 5 4 8            0            0            0            0            0            0
#>   pre_7_months pre_8_months pre_9_months pre_10_months
#> 1            1            1            1             1
#> 2            0            0            0             0
#> 3            0            0            0             0
#> 4            0            0            0             0
#> 5            0            0            0             0

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 1
    If you're not using the names to do computations, there is no need to transform them to symbols. They can be simple character vectors because you can unquote strings on the LHS of `:=`. Also you can now use glue interpolation of strings, so that would be: `"{var_name}" := replace_na(...)` – Lionel Henry May 05 '20 at 08:02
  • @LionelHenry thanks - I had missed that. I have added an edited version with acknowledgement. – Allan Cameron May 05 '20 at 08:25
  • 1
    Thanks! I would also mention that this only works because we are creating new names (on the LHS). To refer to columns in computations (on the RHS), the strings must be transformed to symbols, this way they represent the columns and not... strings. – Lionel Henry May 05 '20 at 09:15