How to specify columns to exclude when retaining all distinct rows?

Question

How do you retain all distinct rows in a data frame excluding certain columns by specifying only the columns you want to exclude. In the example below

library(dplyr)
dat <- data_frame(
    x = c("a", "a", "b"),
    y = c("c", "c", "d"),
    z = c("e", "f", "f")
)

I'd like to return a data frame with all distinct rows among variables x and y by only specifying that I'd like to exclude column z. The data frame returned should look like the data frame returned from here

dat %>% distinct(x, y)

You would think you can do the following, but it results in an error

dat %>% distinct(-z)

I prefer a tidyverse solution

So maybe do `select(-z)` first? – joran Feb 19 '19 at 17:16 — joran, Feb 19 '19 at 17:16
What's wrong with `unique(dat[1:2])`? – jay.sf Feb 19 '19 at 17:18 — jay.sf, Feb 19 '19 at 17:18

arg0naut91 · Accepted Answer · 2020-07-27T19:48:34.373

28

Just do:

library(dplyr)

dat %>%
  distinct_at(vars(-z))

Output:

# A tibble: 2 x 2
  x     y    
  <chr> <chr>
1 a     c    
2 b     d

And as of dplyr 1.0.0, you can use across:

dat %>% 
  distinct(across(-z))

edited Jul 27 '20 at 19:48

answered Feb 19 '19 at 17:21

arg0naut91

14,574
2
17
38

7

`distinct_at()` will be superseded by use of `across()` inside `distinct()` from version 1.0.0 (https://dplyr.tidyverse.org/news/index.html#across). The equivalent pattern for your answer would be `dat %>% distinct(across(-z))` but `distinct_at()` will still be available for several years. – zek19 Jul 23 '20 at 16:28

score 1 · Answer 2 · answered Feb 19 '19 at 17:18

1

We could use

dat %>% 
    distinct(!!! rlang::syms(setdiff(names(.), "z")))
# A tibble: 2 x 2
#  x     y    
#  <chr> <chr>
#1 a     c    
#2 b     d

answered Feb 19 '19 at 17:18

akrun

874,273
37
540
662

How to specify columns to exclude when retaining all distinct rows?

2 Answers2