0

I am really new on dealing with evaluation issues in R.

example<- data.frame(id = 1:5,
                 pairs0 = c(1, 1, 1, 2, 2),
                 pairs1 = c(2, 2, 1, 1, 1)
                     )

Here is the function that I am trying to write:

f <- function(df, col_pair){
       
       df2 <- df %>% mutate(j = row_number())

full_join(df2 %>% select(j, col_pair),
          df2 %>% select(j, col_pair),
          suffix = c('1', '2'),
          by = "{{col_pair}}",
          keep = TRUE) %>%
filter(j1 != j2)
}

The function picks a data frame df and joins it to itself by column col_pair. The thing is, if I run f(example, pairs0), I get that "join columns must be present in data"

Can someone help?

1 Answers1

2

I have made a modification to your function, you can see as an option because it uses the variable with quotes and it can be less troublesome as using other evaluation schemes. Here the code:

#Function
f <- function(df, col_pair){
  
  df2 <- df %>% mutate(j = row_number())
  
  full_join(df2 %>% select(j, col_pair),
            df2 %>% select(j, col_pair),
            suffix = c('1', '2'),
            by = col_pair,
            keep = TRUE) %>%
    filter(j1 != j2)
}
#Apply
f(example, 'pairs0')

Output:

  j1 pairs01 j2 pairs02
1  1       1  2       1
2  1       1  3       1
3  2       1  1       1
4  2       1  3       1
5  3       1  1       1
6  3       1  2       1
7  4       2  5       2
8  5       2  4       2

Also, if non standard evaluation is needed you can use this:

#Option 2
f <- function(df, col_pair){
  
  var <- enquo(col_pair)
  
  df2 <- df %>% mutate(j = row_number())
  
  full_join(df2 %>% select(j, !!var),
            df2 %>% select(j, !!var),
            suffix = c('1', '2'),
            by = rlang::as_name(var),
            keep = TRUE) %>%
    filter(j1 != j2)
}

We apply:

#Apply
f(example, pairs0)

Output:

  j1 pairs01 j2 pairs02
1  1       1  2       1
2  1       1  3       1
3  2       1  1       1
4  2       1  3       1
5  3       1  1       1
6  3       1  2       1
7  4       2  5       2
8  5       2  4       2
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thank you! Could you please take a look in the updated question? I want to join only on the ```col_pair``` column, so that I have trouble when dealing with the ```by``` argument in ```full_join``` – Arthur Carvalho Brito Sep 24 '20 at 17:38
  • @Duck do you mean `enquo` and `!!` with new evaluation standard? Reading [how to program with `dplyr`](https://dplyr.tidyverse.org/articles/programming.html), I thought `{{}}` is the new kid in town? – starja Sep 24 '20 at 17:42
  • @starja Yeah both are as the new kids of town for functions in `dplyr` at least that is what I have seen. Sorry if I was wrong and I confused you! – Duck Sep 24 '20 at 18:20
  • @ArthurCarvalhoBrito I have added an update for what you want, maybe it be helpful for you! – Duck Sep 24 '20 at 18:29
  • @Duck thank you! Just one more thing, I get this message after running the real function that applies your solution: ```Using an external vector in selections is ambiguous. i Use `all_of(col_pair)` instead of `col_pair` to silence this message.``` Should I actually follow this ? Where should the replacements be? – Arthur Carvalho Brito Sep 24 '20 at 18:40
  • 1
    @ArthurCarvalhoBrito It is a warning. It makes reference that new functions from dplyr can do the task. Try changing the `by=col_pair` section with `by = all_of(col_pair)`. Results are the same. I have tested it :) – Duck Sep 24 '20 at 18:44
  • @ArthurCarvalhoBrito Also, I have dived deep and added another version of your function using non standard evaluation, this means no quotes. So you can have two options if needed. Same outputs! – Duck Sep 24 '20 at 19:04