Map over list of dataframes and apply custom mutate-function (purrr, dplyr)

Question

So I have this list:

list(`0` = structure(list(fn = 0L, fp = 34L, tn = 0L, tp = 34L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.1` = structure(list(
    fn = 1L, fp = 26L, tn = 8L, tp = 33L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.2` = structure(list(
    fn = 3L, fp = 22L, tn = 12L, tp = 31L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.3` = structure(list(
    fn = 5L, fp = 7L, tn = 27L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.4` = structure(list(
    fn = 5L, fp = 3L, tn = 31L, tp = 29L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.5` = structure(list(
    fn = 7L, fp = 1L, tn = 33L, tp = 27L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.6` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.7` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.8` = structure(list(
    fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `0.9` = structure(list(
    fn = 30L, fp = 0L, tn = 34L, tp = 4L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), `1` = structure(list(
    fn = 34L, fp = 0L, tn = 34L, tp = 0L), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")))

It is basically a list of length 10 when I applied a quantile-regression model for 10 different quantiles. Each element is a dataframe containing the true/false postive/negative counts. Now I would like to write a function where I can "dynamically" compute the various metrics that one can compute with these counts. So the first element for example looks like this:

> cms[[1]]
# A tibble: 1 x 4
     fn    fp    tn    tp
  <int> <int> <int> <int>
1     0    34     0    34

As it is a list I really wanted to do something with purrr's map or lapply or something similar. I then thought: Well some day I want the True Positive Rate and some day I maybe want the Specificity. Hence, I thought I would write a function, that could take some of the columns as input and do a "classic" dplyr::mutate. But once again I am stuck with my knowledge about tidy evaluation. So I did something like this (and please don't judge it):

fun = function(...){
  f = rlang::enexpr(...)
  return(f)
}

fpr = fun(tp / tp + fn)

# does not work
map(cms, ~mutate(.x, fpr=fpr)) 

# this (non-tidy-eval) works
map(cms, ~mutate(.x, fpr=tp / tp + fn))

I would really like to dynamically pass in columns and compute the result using tidy-evaluation. I thus would appreciate a lot any help or pointer:)

Is there a reason your data has remain a list of data frames? Tidy verbs will always be awkward with a nested data structure; the more idiomatic way to work with data like this would be to bind the data frames together into one (adding a `quantile` column to identify where the rows came from) and use normal mutate functions. Even if you need the data in a list, it might be easier to do this and then use `split()` or `group_split()` to restore the original structure. — Joe Roe, Jun 22 '21 at 12:55

Anoushiravan R · Accepted Answer · 2021-06-22T16:20:37.310

You can also use the following solution.

First we have to define a function that takes a data set and a number of arguments. We explicitly use data argument for our data set and capture all the other arguments through ...
WE then use enquos function which returns a list of quoted function to defuse the expression we captured through ... and force evaluate it by big bang operator !!! which is normally used for splicing a list of arguments in the context of our data set data through tidy_eval function
We then iterate over each element of the list and apply our function on each and every one of them while evaluating our desired expression

library(rlang)

fn <- function(data, ...) {
  args <- enquos(...)
  
  data %>%
    mutate(out = eval_tidy(!!!args, data = data))
}

df %>%
  map_dfr(~ .x %>% fn(tp / (tp + fn)))

# A tibble: 11 x 5
      fn    fp    tn    tp   out
   <int> <int> <int> <int> <dbl>
 1     0    34     0    34 1    
 2     1    26     8    33 0.971
 3     3    22    12    31 0.912
 4     5     7    27    29 0.853
 5     5     3    31    29 0.853
 6     7     1    33    27 0.794
 7     8     0    34    26 0.765
 8     8     0    34    26 0.765
 9     8     0    34    26 0.765
10    30     0    34     4 0.118
11    34     0    34     0 0

This is it what i was looking for! Thank you very very much!! Now I just need to understand whats happening here:) — Lenn, Jun 23 '21 at 10:08
@Lenn You're welcome. For more information you can start by reading ?rlang::`nse-force`, or https://tidyeval.tidyverse.org/multiple.html they really helped me wrap my head around it. — Anoushiravan R, Jun 23 '21 at 13:01

score 1 · Answer 2 · answered Jun 22 '21 at 13:25

I'm not sure if I understood you right, but you could define your parameter calculation like this:

fpr <- \(...) with(list(...), tp / (tp + fn))

Then define a helper function:

add_param <- \(f, ...) tibble::tibble(..., "{substitute(f)}" := f(...))

Finally, invoke it via pmap():

library(purrr)

cms %>%
  dplyr::bind_rows() %>%
  pmap_dfr(add_param, fpr)

Returns:

# A tibble: 11 x 5
      fn    fp    tn    tp   fpr
   <int> <int> <int> <int> <dbl>
 1     0    34     0    34 1    
 2     1    26     8    33 0.971
 3     3    22    12    31 0.912
 4     5     7    27    29 0.853
 5     5     3    31    29 0.853
 6     7     1    33    27 0.794
 7     8     0    34    26 0.765
 8     8     0    34    26 0.765
 9     8     0    34    26 0.765
10    30     0    34     4 0.118
11    34     0    34     0 0

(Data used:)

cms <- list(`0` = structure(list(fn = 0L, fp = 34L, tn = 0L, tp = 34L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.1` = structure(list( fn = 1L, fp = 26L, tn = 8L, tp = 33L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.2` = structure(list( fn = 3L, fp = 22L, tn = 12L, tp = 31L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.3` = structure(list( fn = 5L, fp = 7L, tn = 27L, tp = 29L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.4` = structure(list( fn = 5L, fp = 3L, tn = 31L, tp = 29L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.5` = structure(list( fn = 7L, fp = 1L, tn = 33L, tp = 27L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.6` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.7` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.8` = structure(list( fn = 8L, fp = 0L, tn = 34L, tp = 26L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `0.9` = structure(list( fn = 30L, fp = 0L, tn = 34L, tp = 4L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), `1` = structure(list( fn = 34L, fp = 0L, tn = 34L, tp = 0L), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")))

score 0 · Answer 3 · answered Jun 22 '21 at 13:53

You can also create a more general function and control the flow using switch(). You can add more measures as necessary. In the simple example below, the input can be either a dataframe in a list column or four columns of numbers.

library(tidyverse)

my_fun_1 <- function(dat, measure = c("fp_rate", "fn_rate")) {
  switch(
    measure,
    fp_rate = dat[["fp"]] / (dat[["fp"]] + dat[["tn"]]),
    fn_rate = dat[["fn"]] / (dat[["fn"]] + dat[["tp"]])
  )
}

dat1 <- dat %>%
  enframe() %>%
  rowwise() %>%
  mutate(
    fnr = my_fun_1(value, "fn_rate"),
    fpr = my_fun_1(value, "fp_rate"),
  ) %>%
  ungroup()

dat1

# # A tibble: 11 x 4
#    name  value               fnr    fpr
#    <chr> <list>            <dbl>  <dbl>
#  1 0     <tibble [1 x 4]> 0      1
#  2 0.1   <tibble [1 x 4]> 0.0294 0.765
#  3 0.2   <tibble [1 x 4]> 0.0882 0.647
# <Omitted>

my_fun_2 <- function(fn, fp, tn, tp, measure = c("fp_rate", "fn_rate")) {
  switch(measure,
    fp_rate = fp / (fp + tn),
    fn_rate = fn / (fn + tp)
  )
}

dat2 <- dat %>%
  bind_rows(.id = "quantile") %>%
  mutate(
    fnr = my_fun_2(fn, fp, tn, tp, "fn_rate"),
    fpr = my_fun_2(fn, fp, tn, tp, "fp_rate")
  )

dat2

# # A tibble: 11 x 7
#    quantile    fn    fp    tn    tp    fnr    fpr
#    <chr>    <int> <int> <int> <int>  <dbl>  <dbl>
#  1 0            0    34     0    34 0      1
#  2 0.1          1    26     8    33 0.0294 0.765
#  3 0.2          3    22    12    31 0.0882 0.647
# <Omitted>

Map over list of dataframes and apply custom mutate-function (purrr, dplyr)

3 Answers3