11

Given the dplyr workflow:

require(dplyr)                                      
mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(grepl(x = model, pattern = "Merc")) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

I'm interested in conditionally applying filter depending on the value of applyFilter.

Solution

For applyFilter <- 1 the rows are filtered with use of the "Merc" string, without the filter all rows are returned.

applyFilter <- 1


mtcars %>%
  tibble::rownames_to_column(var = "model") %>%
  filter(model %in%
           if (applyFilter) {
             rownames(mtcars)[grepl(x = rownames(mtcars), pattern = "Merc")]
           } else
           {
             rownames(mtcars)
           }) %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))

Problem

The suggested solution is inefficient as the ifelse call is always evaluated; a more desireable approach would only evaluate the filter step for applyFilter <- 1.

Attempt

The inefficient working solution would look like that:

mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    # Only apply filter step if condition is met
    if (applyFilter) { 
        filter(grepl(x = model, pattern = "Merc"))
        }
    %>% 
    # Continue 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

Naturally, the syntax above is incorrect. It's only a illustration how the ideal workflow should look.


Desired answer

  • I'm not interested in creating an interim object; the workflow should resemble:

    startingObject
        %>%
        ...
        conditional filter
        ...
        final object
    
  • Ideally, I would like to arrive at solution where I can control whether the filter call is being evaluated or not

Konrad
  • 17,740
  • 16
  • 106
  • 167
  • 1
    There is a bug in your `ifelse()` based solution: `ifelse(1, rownames(mtcars)[grepl(x = rownames(mtcars), pattern = "Merc")], rownames(mtcars))` gives `[1] "Merc 240D"` and not the expected `[1] "Merc 240D" "Merc 230" "Merc 280" "Merc 280C" "Merc 450SE" [6] "Merc 450SL" "Merc 450SLC"`. The value of `ifelse()` is the same length as `test`. For instance `ifelse(c(T, F), 1:3, 4:9)` gives `[1] 1 5` and no warning of what's gone wrong. – Aurèle May 16 '17 at 12:54

1 Answers1

18

How about this approach:

mtcars %>% 
    tibble::rownames_to_column(var = "model") %>% 
    filter(if(applyfilter== 1) grepl(x = model, pattern = "Merc") else TRUE) %>% 
    group_by(am) %>% 
    summarise(meanMPG = mean(mpg))

This means grepl is only evaluated if the applyfilter is 1, otherwise the filter simply recycles a TRUE.


Or another option is to use {}:

mtcars %>% 
  tibble::rownames_to_column(var = "model") %>% 
  {if(applyfilter == 1) filter(., grepl(x = model, pattern = "Merc")) else .} %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))

There's obviously another possible approach in which you would simply break the pipe, conditionally do the filter and then continue the pipe (I know OP didn't ask for this, just want to give another example for other readers)

mtcars %<>% 
  tibble::rownames_to_column(var = "model")

if(applyfilter == 1) mtcars %<>% filter(grepl(x = model, pattern = "Merc"))

mtcars %>% 
  group_by(am) %>% 
  summarise(meanMPG = mean(mpg))
talat
  • 68,970
  • 21
  • 126
  • 157
  • Thanks for the contribution, some approach with return all `T` crossed my mind. I'm happy to accept; however, in practice I was thinking if it's possible to force `dplyr` to conditionally *skip* a step. In your answer the `filter()` is evaluated to all `T`. – Konrad May 16 '17 at 12:47
  • 1
    @Konrad, I added another approach – talat May 16 '17 at 12:54
  • I did not know about the `%<>%` operator. That is very nice. – Marijn Stevering May 17 '17 at 15:12