2

I would like to filter my tibble on several columns with similar names. Specificall I'd like to compare x with x_new, y with y_new and so on but without specifying the explicit name, but by using the structure in columns names.

I tried to use filter_at, but this is not working as I don't know how to evaluate the formula in the last line properly.

my_df %>% 
  filter_at(vars(contains("_new")), any_vars(funs({
    x <- .
    x_name <- quo_name(quo(x))
    x_new_name <- str_replace(x_name, "_new", "")
    paste(x_name, "!=", x_new_name)
  })
  ))

Data

my_df <- tibble(x = 1:5, 
                x_new = c(1:4, 1), 
                y = letters[1:5], 
                y_new = c(letters[1:3], "a", "e"))

# A tibble: 5 x 4
#       x x_new y     y_new
#   <int> <dbl> <chr> <chr>
# 1     1    1. a     a    
# 2     2    2. b     b    
# 3     3    3. c     c    
# 4     4    4. d     a    
# 5     5    1. e     e   

Expected output

# A tibble: 2 x 4
#       x x_new y     y_new
#   <int> <dbl> <chr> <chr>
# 1     4    4. d     a    
# 2     5    1. e     e 
kath
  • 7,624
  • 17
  • 32
  • You have the last line `paste(x_name, "!=", x_new_name)` which is not evaluated – akrun Apr 30 '18 at 11:57
  • Yes, but how can I evaluate it? I tried `eval`, `!!` but it doesn't work. The problem is, I guess, that the environment is not clear for the evaluation.... – kath Apr 30 '18 at 11:59
  • Also, you are trying to group similar columns and compare. May be you need `map` – akrun Apr 30 '18 at 12:00

1 Answers1

3

We could do this with map. Create a vector of unique names by removing the suffix part of the column names ('nm1'). Loop through the 'nm1', select the columns that matches the column name, reduce it to a single logical vector by checking whether the rows are not equal, then reduce the list of logical vectors to a single logical vector and extract the rows based on that

library(tidyverse)
nm1 <- unique(sub("_.*", "", names(my_df)))
map(nm1, ~ my_df %>% 
                select_at(vars(matches(.x))) %>% 
                reduce(`!=`)) %>% 
       reduce(`|`) %>% 
       magrittr::extract(my_df, ., )
#    x x_new y     y_new
#  <int> <dbl> <chr> <chr>
#1     4     4 d     a    
#2     5     1 e     e    

Another option is to create an expression and then evaluate

library(rlang)
nm1 <- names(my_df) %>% 
          split(sub("_.*", "", .)) %>%
          map(~ paste(.x, collapse=" != ") %>%
                   paste0("(", ., ")")) %>%
          reduce(paste, sep = "|")
my_df %>%
   filter(!! parse_expr(nm1))
# A tibble: 2 x 4
#     x x_new y     y_new
# <int> <dbl> <chr> <chr>
#1    4     4 d     a    
#2    5     1 e     e    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • It works, but I wonder whether there's a solution with `filter_at` or something similar without "exiting" the pipe. – kath Apr 30 '18 at 15:26
  • @kath if you check the output of `quo_name(quo(x))`, it is returning 'x' instead of the column names – akrun May 01 '18 at 04:31
  • That's true, but I changed `x_name <- quo_name(quo(.))`, so the correct name is returned, but still it is not possible to write the `filter`-call in an appropriate way... – kath May 06 '18 at 13:30