1

I am trying to write a function that deduplicates my grouped data frame. It asserts that the values in each groups are all the same and then only keeps the first line of the group. I am trying to give it tidyselect-like semantics like are seen in pivot_longer() because I just need to forward the column names into a summary(a = n_distinct(...)) call.

So for an example table

test <- tribble(
  ~G,  ~F, ~v1, ~v2,
  "A", "a",  1,   2,
  "A", "b",  1,   2, 
  "B", "a",  3,   3,
  "B", "b",  3,   3) %>%
  group_by(G)

I expect the call remove_duplicates(test, c(v1, v2)) (using the tidyselect helper c() to return

G   F  v1  v2
A   a   1   2
B   a   1   2

but I get

Error: `arg` must be a symbol

I tried to use the new "embrace" syntax to solve this (see function code below), which fails with the message shown above.

# Assert that values in each group are identical and keep the first row of each
# group
# tab: A grouped tibble
# vars: <tidy-select> Columns expected to be constant throughout the group
remove_duplicates <- function(tab, vars){
  # Assert identical results for identical models and keep only the first per group.
  tab %>%
    summarise(a = n_distinct({{{vars}}}) == 1, .groups = "drop") %>%
    {stopifnot(all(.$a))}
  # Remove duplicates
  tab <- tab %>%
    slice(1) %>%
    ungroup() 
  return(tab)
}

I think that I somehow would need to specify that the evaluation context of the expression vars must be changed to the sub-data-frame of tab that is currently under evaluation by substitute. So something like

tab %>%
  summarise(a = do.call(n_distinct, TIDYSELECT_TO_LIST_OF_VECTORS(vars, context = CURRENT_GROUP))))

but I do not understand the technical details enough to really make this work...

akraf
  • 2,965
  • 20
  • 44

1 Answers1

2

This works as expected if you first enquos your vars then use the curly-curly operator on the result:

remove_duplicates <- function(tab, vars){
  
  vars <- enquos(vars)

  tab %>%
    summarise(a = n_distinct({{vars}}) == 1, .groups = "drop") %>%
    {stopifnot(all(.$a))}

  tab %>% slice(1) %>% ungroup()
}

So now

remove_duplicates(test, c(v1, v2))
#> # A tibble: 2 x 4
#>   G     F        v1    v2
#>   <chr> <chr> <dbl> <dbl>
#> 1 A     a         1     2
#> 2 B     a         3     3
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Awesome, thank you! However, I don't understand why there is the new `{{` operator then. Because if I write `!!vars` instead of `{{vars}}` it works as well. What is the benefit of `{{`, then? – akraf Sep 15 '20 at 09:28
  • @akraf this seems to be a problem with `n_distinct` rather than the way you are using the curly-curly operator. I don't know if it is perhaps because both `summarise` and `n_distinct` use `enquos` internally? – Allan Cameron Sep 15 '20 at 10:12