4

I am trying to pass a set of variables/values in a data.frame to a map function, but am not sure how to deal with the fact that .x refers to a quosure that needs to be evaluated: mutate(df2 = map2(variable, value, ~filter(df1, .x==.y))) A naive !!.x will not work.

Here my data.frame has one column for variable, one for value, that will be mapped in a filter call:

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) 
#> # A tibble: 2 x 2
#>   variable value
#>   <chr>    <chr>
#> 1 wool     A    
#> 2 tension  L

How can I pass these to filter? Should I declare instead variable as quosure? I tried a few approaches:

library(tidyverse)
data(warpbreaks)

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, ~filter(warpbreaks, .x==.y)))
#> # A tibble: 2 x 3
#>   variable value data_filtered       
#>   <chr>    <chr> <list>              
#> 1 wool     A     <data.frame [0 × 3]>
#> 2 tension  L     <data.frame [0 × 3]>

tibble(variable=c(quo(wool), quo(tension)), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, ~filter(warpbreaks, eval_tidy(.x)==.y)))
#> Error in eval_tidy(.x): object 'wool' not found
Spacedman
  • 92,590
  • 12
  • 140
  • 224
Matifou
  • 7,968
  • 3
  • 47
  • 52
  • 2
    Are you looking for `~ filter(df1, !!sym(.x) == .y)`? This transforms your variable name to a symbol (i.e. something that looks like an R variable), and inserts it inside the `==` expression with `!!`. – Lionel Henry May 07 '19 at 15:44
  • What's the expected output you are trying to create? I'm struggling to understand what a filter nested in a map2 nested in a mutate is trying to achieve. – Spacedman May 07 '19 at 16:24
  • @LionelHenry Did you get that part working? I tried with the two solutions (variable is string, variable is quosure) but don't get it working? – Matifou May 07 '19 at 17:26
  • @Spacedman Expected output has data_filtered with the corresponding filtered rows (not 0 rows). Big picture is me trying something like `nest()` output, but with overlapping groups. – Matifou May 07 '19 at 17:27

3 Answers3

1

Something weird goes on with the anonymous function evaluation of .x. To be honest I'm not sure what, but defining a function outside of the map2 call seems to work alright (credit to @Lionel Henry for the ~ filter(df1, !!sym(.x) == .y) bit:

library(tidyverse)

df <- tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) 

data(warpbreaks)

# doesn't work with anonymous function
tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, ~ filter(warpbreaks, !!sym(.x) == .y)))
#> Error in is_symbol(x): object '.x' not found

# works when you define function outside of map2
temp <- function(x, y, data){
  filter(data, !!sym(x) == y)
}

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, temp, warpbreaks))
#> # A tibble: 2 x 3
#>   variable value data_filtered        
#>   <chr>    <chr> <list>               
#> 1 wool     A     <data.frame [27 x 3]>
#> 2 tension  L     <data.frame [18 x 3]>

Created on 2019-05-07 by the reprex package (v0.2.1)

You can also do the following without the externally defined function:

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered = map2(variable, value, ~ filter(..3, ..3[[..1]] == ..2), warpbreaks))
#> # A tibble: 2 x 3
#>   variable value data_filtered        
#>   <chr>    <chr> <list>               
#> 1 wool     A     <data.frame [27 x 3]>
#> 2 tension  L     <data.frame [18 x 3]>
zack
  • 5,205
  • 1
  • 19
  • 25
  • 1
    Sometimes base R code can be neater and possibly easier to understand and without having to quote and de-quote expressions. This: `apply(df,1,function(r){warpbreaks[warpbreaks[[r[1]]] == r[2],,drop=FALSE]})` returns a list (instead of a tibble with a data frame list column) but its an equivalent structure. And very little "weird" things happen. And it doesn't break with new package releases. – Spacedman May 07 '19 at 17:06
  • I agree with you regarding code simplicity and base R, particularly for your example. In terms of OP's question, I generally operate on the assumption that OP's example is useful for their purpose outside of the scope of what's specifically in the question. – zack May 07 '19 at 17:36
  • 1
    No problem with your answer, (I've not downvoted it!) but I think too many R users have never used square bracket subsetting, or `apply`, and I think it always a good idea to say "you could use base R for this just as (or more) easily". – Spacedman May 07 '19 at 20:30
  • 1
    @zack You had to create a named function because otherwise `!!` operates too early, when the first `mutate()` is called. Actually I should have suggested to use `.data[[name]]` instead of `!!sym(name)`. For that reason, and also because it is easier to understand than `!!`. – Lionel Henry May 08 '19 at 16:30
  • Thanks @LionelHenry, I've updated my answer to include what I think you are describing - if you'd like to answer yourself I'd happily remove it. – zack May 08 '19 at 16:46
  • hmm no that's not what I meant ;) Can you remove the note please? I meant the `.data` pronoun created by tidy eval verbs like `filter()`. But actually there's a timing problem with the `.data` solution as well, so creating a named function is still the best way. – Lionel Henry May 09 '19 at 06:49
  • @Spacedman Using `apply()` with data frames seems a bad suggestion in general because of the conversion to matrix. The data frame must be mono-typed for this to work without surprises. – Lionel Henry May 09 '19 at 07:00
  • @LionelHenry yes but it seems a good idea to understand the surprises in base R and not have to worry about the surprises in every other add on package. – Spacedman May 09 '19 at 07:09
  • Are you campaigning against CRAN? Anyway, I have added an answer using your idea of mapping directly over the data frame. – Lionel Henry May 09 '19 at 07:20
  • thanks @zack the second solution is neat (and yes, your assumption that OP's question is just a simplification of a larger problem was warranted), although I do not really understand the use of ..3. I thought I could use only ..1, ..2 with pmap. With map2, is ..3 referring to... the dataset? Thanks! – Matifou May 10 '19 at 16:32
  • @Matifou - `..3` refers to a third argument being passed to `map2` after the function call. It's utilizing the `...` argument `map2` allows. In this case, I'm passing `warpbreaks`. I wanted to make it an argument rather than calling it straight from the function for the solution to be more general. Hope that made sense. – zack May 10 '19 at 16:52
1

In your example you're trying to use dplyr verbs in a nested way: there's a filter() inside mutate(). This works well for the normal use, but we need to be a little careful when using tidy eval features because they are applied very early, when the outer function is called. For this reason there's often a timing problem if you try to use !! or .data in the inner verb.

@zack's answer shows how you can decompose the problem in two steps to avoid the nested issue. In this case, another possibility is to omit the mutate() step by mapping directly over df (credit to @Spacedman for the idea). Here we're going to use pmap() which maps in parallel over a list or data frame:

# For pretty-printing
options(tibble.print_max = 5, tibble.print_min = 5)
warpbreaks <- as_tibble(warpbreaks)

pmap(df, ~ filter(warpbreaks, .data[[.x]] == .y))
#> [[1]]
#> # A tibble: 27 x 3
#>   breaks wool  tension
#>    <dbl> <fct> <fct>
#> 1     26 A     L
#> 2     30 A     L
#> 3     54 A     L
#> 4     25 A     L
#> 5     70 A     L
#> # … with 22 more rows
#>
#> [[2]]
#> # A tibble: 18 x 3
#>   breaks wool  tension
#>    <dbl> <fct> <fct>
#> 1     26 A     L
#> 2     30 A     L
#> 3     54 A     L
#> 4     25 A     L
#> 5     70 A     L
#> # … with 13 more rows
Lionel Henry
  • 6,652
  • 27
  • 33
  • thanks! But then I am still facing the problem on how to include this into my main data frame. Surprisingly, `df %>% mutate(data_filtered= pmap(df, ~ filter(warpbreaks, .data[[.x]] == .y)))` does not seem to work!? – Matifou May 10 '19 at 16:26
  • You can assign the list column in your data frame afterwards. Your snippet doesn't work because `.data[[` unquotes its argument (here, `.x`) too early. That's the same issue as using `!!` in a nested tidyeval function. I recognise this is not ideal, but if we fixed that we'd make other patterns harder to do. An easy fix is to give a name to your function and pass the function by name, as in Zack's answer. – Lionel Henry May 11 '19 at 11:23
1

You can use R's native substitution tools, rlang is more valuable when dealing with environments but for more complex symbol substitution (nested for example) base R is easier (for me at least).

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, ~eval(bquote(
    filter(warpbreaks, .(sym(.x)) ==.y)))))

tibble(variable=c("wool", "tension"), 
       value= c("A", "L")) %>% 
  mutate(data_filtered=map2(variable, value, ~eval(substitute(
    filter(warpbreaks, X ==.y), list(X = sym(.x))))))

# output for either
# # A tibble: 2 x 3
#       variable value data_filtered        
#          <chr>    <chr> <list>               
#   1 wool     A     <data.frame [27 x 3]>
#   2 tension  L     <data.frame [18 x 3]>
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Nice! Interesting to see R native tool, though I guess `sym()` is still a rlang function?! – Matifou May 10 '19 at 16:33
  • `sym` lives in *rlang* but it's reexported by *dplyr* (`dplyr::sym` exists), it is basically `base::as.name` or `base::as.symbol` (they're aliases) with different checks and treatment of corner cases so you could use those here as well. The idea here anyway was not to try to get free from *rlang* (*dplyr* imports it so it would not make much sense) but to offer a readable solution. – moodymudskipper May 10 '19 at 16:46