My goal is to write a function take_by_rank
that
- can operate on arbitrary selection of numeric columns within a data frame;
- uses non-standard evaluation like
base::subset
ordplyr
verbs; - understands the minus sign naturally, so that
-foo
means "the largest value offoo
gets the lowest rank"; - returns
n
top or bottom rows by the total rank, which is the sum of ranks computed for each of the selected variables.
I'm interested in both learning the newest dplyr way and looking for alternative approaches, i.e. there is no restriction on package selection (pure base
or data.table
maybe?).
My current solution is
library(data.table)
library(dplyr)
library(rlang)
take_by_rank <- function(df, ..., n = 100) {
selected_vars <- quos(...)
if (!length(selected_vars))
stop("No variables to rank!")
prefix <- ".rank_"
for (i in seq_along(selected_vars)) {
rank_name <- paste0(prefix, quo_name(selected_vars[[i]]))
df <- df %>%
mutate(!!rank_name := frankv(!!selected_vars[[i]]))
}
df %>%
mutate(TotalRank = rowSums(select(df, starts_with(prefix)))) %>%
arrange(TotalRank) %>%
top_n(n, -TotalRank)
}
It seems to be okay, but maybe I'm missing something more straightforward. If there's a way to replace the for loop, that would also be nice.
Usage examples (for reference)
take_by_rank(mtcars, mpg, qsec, n = 3)
mpg cyl disp hp drat wt qsec vs am gear carb .rank_mpg .rank_qsec TotalRank
1 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4 3 3 6
2 15.0 8 301 335 3.54 3.57 14.60 0 1 5 8 6 2 8
3 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 4 5 9
take_by_rank(mtcars, mpg, qsec, n = -3)
mpg cyl disp hp drat wt qsec vs am gear carb .rank_mpg .rank_qsec TotalRank
1 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 24.5 32 56.5
2 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 31.0 27 58.0
3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 32.0 28 60.0
take_by_rank(mtcars, mpg, -qsec, n = 3)
mpg cyl disp hp drat wt qsec vs am gear carb .rank_mpg .rank_-qsec TotalRank
1 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 14.0 2 16.0
2 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 1.5 15 16.5
3 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 1.5 16 17.5