13

How do you retain all distinct rows in a data frame excluding certain columns by specifying only the columns you want to exclude. In the example below

library(dplyr)
dat <- data_frame(
    x = c("a", "a", "b"),
    y = c("c", "c", "d"),
    z = c("e", "f", "f")
)

I'd like to return a data frame with all distinct rows among variables x and y by only specifying that I'd like to exclude column z. The data frame returned should look like the data frame returned from here

dat %>% distinct(x, y)

You would think you can do the following, but it results in an error

dat %>% distinct(-z)

I prefer a tidyverse solution

jay.sf
  • 60,139
  • 8
  • 53
  • 110
David Rubinger
  • 3,580
  • 1
  • 20
  • 29

2 Answers2

28

Just do:

library(dplyr)

dat %>%
  distinct_at(vars(-z))

Output:

# A tibble: 2 x 2
  x     y    
  <chr> <chr>
1 a     c    
2 b     d    

And as of dplyr 1.0.0, you can use across:

dat %>% 
  distinct(across(-z))
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
  • 7
    `distinct_at()` will be superseded by use of `across()` inside `distinct()` from version 1.0.0 (https://dplyr.tidyverse.org/news/index.html#across). The equivalent pattern for your answer would be `dat %>% distinct(across(-z))` but `distinct_at()` will still be available for several years. – zek19 Jul 23 '20 at 16:28
1

We could use

dat %>% 
    distinct(!!! rlang::syms(setdiff(names(.), "z")))
# A tibble: 2 x 2
#  x     y    
#  <chr> <chr>
#1 a     c    
#2 b     d    
akrun
  • 874,273
  • 37
  • 540
  • 662