6

Is there a way to specify dplyr::distinct should use all column names without resorting to nonstandard evaluation?

df <- data.frame(a=c(1,1,2),b=c(1,1,3))

df %>% distinct(a,b,.keep_all=FALSE)          # behavior I'd like to replicate

vs

df %>% distinct(everything(),.keep_all=FALSE) # with syntax of this form
Peter O.
  • 32,158
  • 14
  • 82
  • 96
Sam Hart
  • 109
  • 1
  • 5
  • 1
    Does `df %>% distinct()`give you what you want? – sboysel Mar 24 '16 at 00:49
  • Unfortunately it doesn't. I believe passing the data frame as the sole argument used to yield the correct result, however recent releases have seen changes to the distinct function. I currently get: `Error: No variables selected` – Sam Hart Mar 24 '16 at 16:28
  • `df %>% unique` works as an alternative, though not the most satisfying answer. – Sam Hart Mar 24 '16 at 21:58
  • Is this a new bug in `dplyr`? I swear I saw it work fine. Getting same error of no variables selected. – Gopala Apr 04 '16 at 15:24
  • @Gopala, not a bug. Just a design decision in the new version. I used distinct() without arguments pretty frequently , now using unique() for the same purpose. – Sam Hart Apr 05 '16 at 18:35
  • But, unique is terribly slow. :) – Gopala Apr 05 '16 at 21:43

2 Answers2

5

You can distinct the all columns with the code below.

library(dplyr)
library(data.table)

df <- data_frame(
  id = c(1, 1, 2, 2, 3, 3),
  value = c("a", "a", "b", "c", "d", "d")
)
# A tibble: 6 × 2
# id value
# <dbl> <chr>
# 1     1     a
# 2     1     a
# 3     2     b
# 4     2     c
# 5     3     d
# 6     3     d

# distinct with Non-Standard Evaluation
df %>% distinct()

# distinct with Standard Evaluation
df %>% distinct_()

# Also, you can set the column names with .dots.
df %>% distinct_(.dots = names(.))
# A tibble: 4 × 2
# id value
# <dbl> <chr>
# 1     1     a
# 2     2     b
# 3     2     c
# 4     3     d

# distinct with data.table
unique(as.data.table(df))
# id value
# 1:  1     a
# 2:  2     b
# 3:  2     c
# 4:  3     d
Keiku
  • 8,205
  • 4
  • 41
  • 44
1

As of version 1.0.5 of dplyr, the two following options yield the same output.

df <- data.frame(a = c(1, 1, 2),
                 b = c(1, 1, 3))

df %>% distinct(a, b)

  a b
1 1 1
2 2 3

df %>% distinct(across(everything()))

  a b
1 1 1
2 2 3

No reason to specify .keep_all = FALSE argument as this is the default.

You could also use tibble() instead of data.frame()

gradcylinder
  • 370
  • 2
  • 6