For loop with dplyr pipeline: problem using dynamic and date variables correctly

Question

I have the below code and example data. I have two issues:

The name of the new variable created using mutate appears as "New_var" in the corresponding data frames rather than the character string(e.g., df1_timediff) that I have assigned to it within the for loop.
Based on answers for similar questions, I have tried using eval, as.name, and as.character both when defining the New_var variable and within the pipeline but with no luck. When I check the class of New_var, R tells me they are "character".
I would like the New_var variable to be a time difference variable between the current entry and the first entry for that corresponding participants. I have used similar code previously, however, the New_var variable does not appear to be as expected. That is, the time difference returned is not the months between entries. The class of the Submitted_i variables are in Date format, so I'm confused why this might be.

Code

names.dfs <- c("df1", "df2", "df3")

for (i in names.dfs){

  Submitted_i <- as.name(paste0('Submitted_', i))
  New_var <- as.name(paste0(i,'_timediff'))
  
  df_i <-  get(i)
  
  df_i <- df_i %>%
        arrange(eval(Submitted_i)) %>% # Order by date
        group_by(ResultsID) %>% 
        mutate(New_var = (time_length(difftime(eval(Submitted_i), eval(Submitted_i)[1],"months")))) 
               
  assign(paste0(i),df_i)

  }

Example Data


df1 <- structure(list(ResultsID = c(1, 2, 3, 4, 2, 4, 1, 5, 3, 3), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df1 = structure(c(17509, 
17509, 17514, 17484, 17929, 17484, 17502, 17528, 17497, 17488
), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
  
df2 <- structure(list(ResultsID = c(1, 5, 1, 3, 2, 4, 5), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L), Submitted_df2 = structure(c(16856, 16858, 
16869, 16861, 16875, 16888, 16891), class = "Date")), row.names = c(NA, 
7L), class = "data.frame")
  
df3 <- structure(list(ResultsID = c(1, 2, 3, 1, 2, 4, 4, 5, 3), RepeatNo = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Submitted_df3 = structure(c(17913, 
17930, 17919, 17931, 17921, 17912, 17916, 17931, 17915), class = "Date")), row.names = c(NA, 
-9L), groups = structure(list(.rows = structure(list(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))

For your first problem: use `{{ New_var }} := ` instead. – Martin Gal Aug 24 '21 at 09:22 — Martin Gal, Aug 24 '21 at 09:22

score 2 · Accepted Answer · answered Aug 24 '21 at 09:56

Your second issue is a problem of your brackets. In your code "months" is the third argument of the difftime function, not the unit-argument of the time_length function. When you add the comment from Martin Gal, it works fine:

library(lubridate)
library(dplyr)

names.dfs <- c("df1", "df2", "df3")

for (i in names.dfs){

  Submitted_i <- as.name(paste0('Submitted_', i))
  New_var <-  as.name(paste0(i,'_timediff'))

  df_i <-  get(i)

  df_i <- df_i %>%
    arrange(eval(Submitted_i)) %>% # Order by date
    group_by(ResultsID) %>% 
    mutate({{New_var}} := time_length(
                               difftime(
                                   eval(Submitted_i),
                                   eval(Submitted_i)[1]
                               ),
                               "months"
                           ) 
     )

  assign(paste0(i),df_i)

}

score 1 · Answer 2 · answered Aug 24 '21 at 10:28

In my opinion you should consider storing your data.frames in a list of data.frames. If you need to use get-assign-structures there are usally more elegant ways.

Next you could use purrr's map function to apply your workflow to those dataframes. Inside the map function I recommend renaming the columns to avoid the curly-curly and as.name structures:

library(dplyr)
library(lubridate)
library(purrr)

# create a named list of data.frames
my_list <- list(df1, df2, df3)
names(my_list) <- c("df1", "df2", "df3")

# apply your workflow
my_result_list <- my_list %>% 
  imap(~ .x %>% 
         tibble() %>% 
         # ungroup() %>% 
         `names<-`(., sub("_df.*", "", names(.))) %>% 
         arrange(Submitted) %>%
         group_by(ResultsID) %>% 
# replace / months(1) by %/% months(1) if you want full months, or use a rounding function
         mutate(difftime = interval(first(Submitted), Submitted) / months(1)) %>% 
         rename_with(function(x) paste0("Submitted_", .y), starts_with("Submitted")) %>% 
         rename_with(function(x) paste0(.y, "_difftime"), ends_with("difftime")) %>% 
         ungroup()
  )

This returns a list of data.frames like this:

$df1
# A tibble: 10 x 4
   ResultsID RepeatNo Submitted_df1 df1_difftime
       <dbl>    <int> <date>               <dbl>
 1         4        0 2017-11-14           0    
 2         4        0 2017-11-14           0    
 3         3        0 2017-11-18           0    
 4         3        0 2017-11-27           0.3  
 5         1        0 2017-12-02           0    
 6         1        0 2017-12-09           0.226
 7         2        0 2017-12-09           0    
 8         3        0 2017-12-14           0.867
 9         5        0 2017-12-28           0    
10         2        0 2019-02-02          13.8  

$df2
# A tibble: 7 x 4
  ResultsID RepeatNo Submitted_df2 df2_difftime
      <dbl>    <int> <date>               <dbl>
1         1        0 2016-02-25           0    
2         5        0 2016-02-27           0    
3         3        0 2016-03-01           0    
4         1        0 2016-03-09           0.448
5         2        0 2016-03-15           0    
6         4        0 2016-03-28           0    
7         5        0 2016-03-31           1.13 

$df3
# A tibble: 9 x 4
  ResultsID RepeatNo Submitted_df3 df3_difftime
      <dbl>    <int> <date>               <dbl>
1         4        0 2019-01-16           0    
2         1        0 2019-01-17           0    
3         3        0 2019-01-19           0    
4         4        0 2019-01-20           0.129
5         3        0 2019-01-23           0.129
6         2        0 2019-01-25           0    
7         2        0 2019-02-03           0.290
8         1        0 2019-02-04           0.581
9         5        0 2019-02-04           0

Now you are able to work with your data.frames like this: my_result_list[[1]] returns your transformed df1, my_result_list[[2]] returns df2 etc.

For loop with dplyr pipeline: problem using dynamic and date variables correctly

2 Answers2