Applying simple function via across within nested data on each group

Question

Background

Given nested data, I would like to apply a simple function using across on an arbitrary selection of columns. Using across I want to iterate over the selection of columns passed to one argument of the function and keep the second argument constant.

Example

# Using across within nested data frame

# Gapminder data from gapminder package
library("tidyverse")
data("gapminder", package = "gapminder")

# Sample function
sample_function <- function(.data, var_a, var_b) {
    var_a <- enquo(var_a)
    var_b <- enquo(var_b)
    .data %>%
        mutate(some_res = log(!!var_a) + !!var_b) %>%
        pull(some_res)
}


# Basic example, not working
gapminder %>%
    group_by(country, continent) %>%
    nest() %>%
    mutate(sample_res = map(
        .x = data,
        .f = across(
            .cols = vars(year, lifeExp, pop),
            .fns = ~ sample_function(var_a = .x),
            var_b = gdpPercap
        )
    )) %>%
    unnest(sample_res)

The example fails with the following error:

Error: Problem with mutate() input sample_res. x Must subset columns with a valid subscript vector. x Subscript has the wrong type quosures. ℹ It must be numeric or character. ℹ Input sample_res is map(...). ℹ The error occured in group 1: country = "Afghanistan", continent = "Asia". Run rlang::last_error() to see where the error occurred.

Desired results

I can iterate over the selected columns always passing a different argument in var_a. In this case the values reflect year, lifeExp and gdpPercap variables.

gapminder %>%
    group_by(country, continent) %>%
    nest() %>%
    mutate(
        res_year = map(.x = data, 
                       .f = sample_function, var_a = year, var_b = gdpPercap),
        res_lifeExp = map(.x = data, 
                          .f = sample_function, var_a = lifeExp, 
                          var_b = gdpPercap),
        res_pop = map(.x = data, 
                      .f = sample_function, var_a = pop, var_b = gdpPercap)
    )

Sought solution

The solution obtained in desired results is rather impractical and error-prone as forces new row for each variable. I would like to find a combination of using across and map so I can run different variations of the mapping function only by adding variables to across.

In the `across`, you are using `sample_function(var_a = .x)`, here, I suppose it is the value that is getting passed and not the column name — akrun, Jun 03 '20 at 21:31
do you accept non tidy evaluation answers? It is really though to pass a tidy eval though the kind of loop that I am imagining — Bruno, Jun 03 '20 at 23:32
Inspired by @Brunos approach, I updated my answer below for one more approach using `nest_by`. — TimTeaFan, Jun 04 '20 at 22:55

TimTeaFan · Accepted Answer · 2020-06-04T22:52:53.873

Final update (using nest_by & across)

Inspired by @Brunos answer, I revised my approach to use nest_by / rowwise instead of map (which is, I guess, the new recommended way of wrangling nested tibbles).

The result of my original answer can be easily reproduced using nest_by:

gapminder %>%
  nest_by(country, continent) %>%
  mutate(sample_res = list(transmute(data,
                                     across(c(year, lifeExp, pop),
                                            ~ sample_function(data, var_a = .x, var_b = gdpPercap))
  ))
  )

However, it returns one list-column containing tibbles. If the output were normal vectors we could just remove sample_res = list() and new columns would be added to your existing tibble. However, in this example the output of each new column is a list-column containing vectors. I have not managed to produce this output in one call to mutate(across(...)).

It is possible though to use unnest and then another call to summarise(across(...)) to get the job done.

gapminder %>%
  nest_by(country, continent) %>%
  mutate(sample_res = list(transmute(data,
                             across(c(year, lifeExp, pop),
                                    ~ sample_function(data, var_a = .x, var_b = gdpPercap))
                      ))
         ) %>% 
  unnest(cols = sample_res) %>%
  summarise(across(c(year, lifeExp, pop), list, .names = "res_{col}"))

Original answer (using group_by, nest, map & across)

You misspecified sample_function in your across call. It should be

function(x) sample_function(.x, var_a = x, var_b = gdpPercap)

instead of

~ sample_function(var_a = .x),
                var_b = gdpPercap

Since you are nesting map and mutate(across(...)), I prefer to have at least one "normal" anonymous functions instead of the lamda ~ notation. Otherwise, things can get confusing with two .xs.

Further across should be called inside its own separate mutate.

This should work:

library("tidyverse")
data("gapminder", package = "gapminder")

# Sample function
sample_function <- function(.data, var_a, var_b) {
  var_a <- enquo(var_a)
  var_b <- enquo(var_b)

  .data %>%
    mutate(some_res = log(!!var_a) + !!var_b) %>%
    pull(some_res)
}

gapminder %>%
  group_by(country, continent) %>%
  nest() %>%  
  mutate(sample_res = map(
    data,
    ~ mutate(.x, across(c(year, lifeExp, pop),
                       function(x) { 
                         sample_function(.x, var_a = x, var_b = gdpPercap)
                        }
                       )
    )
   )
  )
#> # A tibble: 142 x 4
#> # Groups:   country, continent [142]
#>    country     continent data              sample_res       
#>    <fct>       <fct>     <list>            <list>           
#>  1 Afghanistan Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#>  2 Albania     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#>  3 Algeria     Africa    <tibble [12 × 4]> <tibble [12 × 4]>
#>  4 Angola      Africa    <tibble [12 × 4]> <tibble [12 × 4]>
#>  5 Argentina   Americas  <tibble [12 × 4]> <tibble [12 × 4]>
#>  6 Australia   Oceania   <tibble [12 × 4]> <tibble [12 × 4]>
#>  7 Austria     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#>  8 Bahrain     Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#>  9 Bangladesh  Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#> 10 Belgium     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#> # … with 132 more rows

^{Created on 2020-06-03 by the reprex package (v0.3.0)}

When using map with custom functions to loop over tibbles in a list-column it is quite helpful to build a first version outside of the loop.

test_dat <- gapminder %>%
  nest_by(country, continent) 

test_dat$data[[1]] %>% 
  mutate(across(
    c(year, lifeExp, pop),
    ~ sample_function(test_dat$data[[1]], var_a = .x, var_b = gdpPercap)
    )
    )

Once this works, then the final step is to replace the object you want to loop over with .x.

Another approach (part of original answer)

Another approach would be to rewrite your original sample_function and include across in your mutate call there. We could make it take a string vector of variable names that will be passed to across. I might like this approach more, because its more flexible. Now you could have another list-column containing different variable names for different subsets of your data and loop over them and your data column with map2.

library("tidyverse")
data("gapminder", package = "gapminder")

sample_function2 <- function(.data, .vars, var_b) {
  .vars <- syms(.vars)
  var_b <- enquo(var_b)

  .data %>%
    mutate(across(c(!!!.vars), function(y) log(y) + !!var_b))
}


gapminder %>%
  group_by(country, continent) %>%
  nest() %>% 
  mutate(sample_res = map(
    data,
    ~ sample_function2(.x,
                       .vars = c("year", "lifeExp", "pop"),
                       var_b = gdpPercap)
  )
  )

#> # A tibble: 142 x 4
#> # Groups:   country, continent [142]
#>    country     continent data              sample_res       
#>    <fct>       <fct>     <list>            <list>           
#>  1 Afghanistan Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#>  2 Albania     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#>  3 Algeria     Africa    <tibble [12 × 4]> <tibble [12 × 4]>
#>  4 Angola      Africa    <tibble [12 × 4]> <tibble [12 × 4]>
#>  5 Argentina   Americas  <tibble [12 × 4]> <tibble [12 × 4]>
#>  6 Australia   Oceania   <tibble [12 × 4]> <tibble [12 × 4]>
#>  7 Austria     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#>  8 Bahrain     Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#>  9 Bangladesh  Asia      <tibble [12 × 4]> <tibble [12 × 4]>
#> 10 Belgium     Europe    <tibble [12 × 4]> <tibble [12 × 4]>
#> # … with 132 more rows

^{Created on 2020-06-04 by the reprex package (v0.3.0)}

Add on (to original answer)

As @Bruno points out that the approaches above are not in the format specified by the OP, here is an alternative solution building on my second approach above, which should yield the desired output.

library("tidyverse")
data("gapminder", package = "gapminder")

sample_function2 <- function(.data, .vars, var_b) {
  .vars <- syms(.vars)
  var_b <- enquo(var_b)

  .data %>%
    transmute(across(c(!!!.vars), function(y) log(y) + !!var_b)) %>% 
    unlist()

}

my_vars <- c("year", "lifeExp", "pop")

gapminder %>%
  group_by(country, continent) %>%
  nest() %>% 
  crossing(vars = my_vars) %>% 
  mutate(sample_res = map2(
    data,
    vars, 
    ~ sample_function2(.x,
                       .vars = .y,
                       var_b = gdpPercap)
  )
  ) %>% 
  pivot_wider(names_from = vars,
              names_prefix = "res_",
              values_from = sample_res) 

#> # A tibble: 142 x 6
#>    country     continent data              res_lifeExp res_pop    res_year  
#>    <fct>       <fct>     <list>            <list>      <list>     <list>    
#>  1 Afghanistan Asia      <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  2 Albania     Europe    <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  3 Algeria     Africa    <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  4 Angola      Africa    <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  5 Argentina   Americas  <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  6 Australia   Oceania   <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  7 Austria     Europe    <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  8 Bahrain     Asia      <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#>  9 Bangladesh  Asia      <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#> 10 Belgium     Europe    <tibble [12 × 4]> <dbl [12]>  <dbl [12]> <dbl [12]>
#> # … with 132 more rows

^{Created on 2020-06-04 by the reprex package (v0.3.0)}

I don't think that this is the right answer, ideally there would be 3 extra columns like the example answer — Bruno, Jun 03 '20 at 22:45
You are right, I didn't notice that the desired output is three different columns. Please see my revised answer. — TimTeaFan, Jun 03 '20 at 23:55

Bruno · Answer 2 · 2020-06-04T00:00:06.920

Here you go, not fancy but gets the job done

library("tidyverse")
data("gapminder", package = "gapminder")

# Sample function

sample_function <- function(.data,vars_a,var_b){
  var_b <- rlang::parse_expr(var_b)

  for (i in vars_a) {

    namer <- paste0("res_",i)
    var_a <- rlang::parse_expr(i)
    .data <- .data %>%
      mutate(!!namer := log(!!var_a) + !!var_b)
  }
  .data


}
sample_function(gapminder,c("year","lifeExp","pop"),"gdpPercap")


gapminder %>% 
  nest_by(country,continent) %>% 
  mutate(result = list(sample_function(data,c("year","lifeExp","pop"),"gdpPercap")))

Here is the slower tidy way

tidy_sample_function <- function(.data,vars_a,var_b){

  vars_a <- .data %>% 
    select({{vars_a}}) %>% 
    names()

  for (i in vars_a) {

    namer <- paste0("res_",i)
    var_a <- rlang::parse_expr(i)
    .data <- .data %>%
      mutate(!!namer := log(!!var_a) + {{var_b}})
  }
  .data


}

gapminder %>% 
  nest_by(country,continent) %>% 
  mutate(result = list(tidy_sample_function(data,c(year,lifeExp,pop),gdpPercap)))

Applying simple function via across within nested data on each group

Background

Example

Desired results

Sought solution

2 Answers2