3

Summary I was reading an article on the subject of dplyr's across function. Looking at the first example of use, I saw the use of operators that I have never seen before. I do not know if they are inherently apart of dplyr or from some other package. Either way, I do not understand their use in the code.

Code Example:

starwars |>
    summarize(across(where(is.character), ~ length(unique(.x))))

The result is a 1 x 8 tibble.

I understand the first argument to across, it is the second argument that perplexes me. What does ~length(unique(.x)) mean? What does the .x code mean? I understand that length is being applied to every character vector in the tibble, but what does "unique" do for the code fragment?

What have I tried to resolve this problem myself? I have tried using Google to search for [R] ~ operator and received no relevant results. I also tried rdrr.io, r-project.org, and CRAN without a resolution. As well as the tidyverse.org and the documentation for purrr--this was due to seeing someone reference purrr when using the very same syntax in their code.

Question:

Can someone help me understand what is happening internally?

user438383
  • 5,716
  • 8
  • 28
  • 43
student-R
  • 41
  • 3
  • These posts might help - https://stackoverflow.com/questions/53159979/tilde-dot-in-r and https://stackoverflow.com/questions/44834446/what-is-meaning-of-first-tilde-in-purrrmap – Ronak Shah Jul 05 '21 at 02:27

2 Answers2

10

This is called purr-style lambda, starting with a tilde ~ and using .x to refer to every individual column that has been selected in .cols argument. So:

# We can either use
starwars |> summarize(across(where(is.character), ~ length(unique(.x)))) 

# Or we can define our anonymous function like this
starwars |> summarize(across(where(is.character), function(x) length(unique(x)))) 

They are equivalent. But it should be noted that you specified your function to be applied on every column in starwars data set which is of class character. For this purpose in this code it uses where function which is a helper that applies a function (here is.character) to select only those columns that are of class character. Then we apply our anonymous function on each of them and saving the result in a separate column. So .x here represents every column of class character. Just note that |> is R's newly-created, native pipe operator.

We normally use %>% with tidyverse packages, but both are doing the same job of replacing the result of its LHS to the first argument of its RHS which is normally (and in this case) .data argument. One more thing to mention here, though it was used for another purpose below. In order to understand how lambda-style-formula is interpreted in the back end we can use as_mapper which is the powerhouse behind the varied function specifications that most purrr functions allow:

library(purrr)

> as_mapper(~ length(unique(.x)))
<lambda>
function (..., .x = ..1, .y = ..2, . = ..1) 
length(unique(.x))
attr(,"class")
[1] "rlang_lambda_function" "function" 

If you pay attention to its output, you will realize it is interpreted as an anonymous function we usually use.

For more info please read this documentation.

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    So, for each character vector variable in my tibble, the function unique is applied to each of them. If I have three character vector variables: a, b, and c in my tibble starwars, then if a is defined like this a <- c("equivalent", "but", "for", "then", "so", "just", "we", "for") then unique(a) returns the character vector [1] "equivalent" "but" "for" "then" "so" "just" "we". The last element in a, "for", is not repeated. So, length(unique(a)) returns an integer vector that counts the number of characters in each member string in a. So, length(unique(a)) results in [1] 10 3 3 4 2 4 2. Right? – student-R Jul 05 '21 at 01:11
  • Yes, precisely. – Anoushiravan R Jul 05 '21 at 10:41
  • It would be a good idea to consider this answer with conjunction with the answer proposed by @jpdugo. You can always put your purrr-style-lambda formula in a call to `as_mapper` to see how it is interpreted in the back-end to an anonymous function. – Anoushiravan R Jul 05 '21 at 10:47
3

~ is a shortcut for writing lambda functions in the tidyverse. ~length(unique(.x)) will yield the same result as function(.x) {length(unique(.x))}.

This way of writing a function is most likely used with purrr library, in fact, purrr::as_mapper() recognizes this syntax and returns a function that can be called as any other in R.

For example: function f will tell me how many unique values does .x have. .x can also be referred to as . or ..1. If you have two arguments then .x and .y. And finally with n arguments ..1, ..2, ..3 .... ..n

require(purrr)
#> Loading required package: purrr

f <- as_mapper(~length(unique(.x)))

f2 <- as_mapper(~length(unique(.)))

f3 <- as_mapper(~length(unique(..1)))
  
f(mtcars$mpg) 
#> [1] 25
f2(mtcars$mpg)
#> [1] 25
f3(mtcars$mpg)
#> [1] 25

Created on 2021-07-05 by the reprex package (v2.0.0)

Another example:

library(dplyr)
library(purrr)

#in all starwars columns that are character vectors, how many unique values are they?

starwars |>
  summarize(across(where(is.character), ~ length(unique(.x))))
#> # A tibble: 1 x 8
#>    name hair_color skin_color eye_color   sex gender homeworld species
#>   <int>      <int>      <int>     <int> <int>  <int>     <int>   <int>
#> 1    87         13         31        15     5      3        49      38

f <- as_mapper(~ if (is.character(.x)) length(unique(.x)) else NULL)

f_base <- function(.x) {if (is.character(.x)) length(unique(.x)) else NULL}

starwars %>% 
  map_dfc(f)
#> # A tibble: 1 x 8
#>    name hair_color skin_color eye_color   sex gender homeworld species
#>   <int>      <int>      <int>     <int> <int>  <int>     <int>   <int>
#> 1    87         13         31        15     5      3        49      38

starwars %>% 
  map_dfc(f_base)
#> # A tibble: 1 x 8
#>    name hair_color skin_color eye_color   sex gender homeworld species
#>   <int>      <int>      <int>     <int> <int>  <int>     <int>   <int>
#> 1    87         13         31        15     5      3        49      38

Created on 2021-07-05 by the reprex package (v2.0.0)

jpdugo17
  • 6,816
  • 2
  • 11
  • 23