Initial situation
I have a data set of the following form:
library(dplyr)
dat <- tribble(
~name, ~iq,
"ben", 100,
"alex", 98,
"mia", 110,
"paco", 124,
"mia", 112,
"mia", 120,
"paco", 112,
"ben", 90,
"alex", 107
)
I'd like to create a new column which ranks, grouped by name
, the values iq
in descending order. In SQL one could write
select
name,
iq,
row_number() over (partition by name order by iq desc) as rank
from
dat;
which would produce the following expected output (already ordered for simplicity):
#> name iq rank
#> <chr> <dbl> <int>
#> 1 alex 107 1
#> 2 alex 98 2
#> 3 ben 100 1
#> 4 ben 90 2
#> 5 mia 120 1
#> 6 mia 112 2
#> 7 mia 110 3
#> 8 paco 124 1
#> 9 paco 112 2
Questions
With my data, one can achieve the desired result with:
dat %>%
group_by(name) %>%
mutate(rank = with_order(order_by = iq,
fun = row_number,
x = desc(iq)
)
) %>%
arrange(name, rank)
#> # A tibble: 9 x 3
#> # Groups: name [4]
#> name iq rank
#> <chr> <dbl> <int>
#> 1 alex 107 1
#> 2 alex 98 2
#> 3 ben 100 1
#> 4 ben 90 2
#> 5 mia 120 1
#> 6 mia 112 2
#> 7 mia 110 3
#> 8 paco 124 1
#> 9 paco 112 2
However, I don't understand why the code works. When reading the documentation of dplyr::with_order()
, it says the arguments are
order_by
= the vector to order byfun
= window functionx, ...
= arguments tof
Given the description in the documentation and the working code, I have two questions I cannot answer:
- What is the purpose of the argument
x
? Why not just specify the vector to order by and the window function (like in sql)? What is meant byf
? - Why don't I have to write
order_by = desc(iq)
? To get the result I expect I have to writex = desc(iq)
and setorder_by = iq
. This seems to contradict the documentation, which states thatorder_by
= the vector to order by.