1

I am quite a newbie in R and therefore cannot explain the following behavior.

So, let's assume I have the this data structure.

> use_data
     ID value1             value2             value3 Time               
  <int> <list>             <list>             <list> <dttm>             
1    52 <tibble [96 x 25]> <tibble [59 x 26]> <NULL> 2012-04-25 03:00:00
2    24 <NULL>             <tibble [30 x 26]> <NULL> 2012-07-18 13:45:00

For simplicity, the following example is not very expressive, but it serves as a demonstration of the problem. That is, I want to use the pmap function to iterate over the columns in parallel. At the moment I only return the ID to create the new column target:

fun_example <- function(df) {
  result_df <- df %>% mutate(target = purrr::pmap(
    .l = list(ID, value1, value2),
    .f = function(x, y, z){
      return (x)
    }
  )) %>% unnest(target)
}
(fun_example(use_data))

As intended, this results in

     ID value1             value2             value3 Time                target
  <int> <list>             <list>             <list> <dttm>               <int>
1    52 <tibble [96 x 25]> <tibble [59 x 26]> <NULL> 2012-04-25 03:00:00     52
2    24 <NULL>             <tibble [30 x 26]> <NULL> 2012-07-18 13:45:00     24

Now I want to set the list with data to iterate over in advance by defining cols <-list(df$ID, df$value1, df$value2) and then

fun_example <- function(df) { 
  result_df <- df %>% mutate(target = purrr::pmap(
    .l = cols,
    .f = function(x, y, z){
      return (x)
    }
  )) %>% unnest(target)
}
(fun_example(use_data))

However, this gives me the following error:

Problem while computing `target = purrr::pmap(...)`.
x `target` must be size 1, not 2.
i The error occurred in group 1: ID = 24.

I guess the problem is that pmap somehow no longer gives the desired result. Finally, two questions:

  1. Can someone explain what is happening in the example described?
  2. Besides defining the data to iterate over directly, is there a way to pass strings of the column like cols <- list("ID", "value1", "value2") ?
mAI
  • 111
  • 1
  • 10
  • You have a "data.frame" that technically has columns, but each value is another possibly complex object (a list, a tibble). This data structure is for sure not for the faint of heart. – Roman Luštrik Apr 23 '22 at 08:48
  • Unfortunately, your issue is hardly reproducible. Have you checked that your data isn't grouped, as this may result in the error you mentioned. – stefan Apr 23 '22 at 08:50
  • I get your point. However, why does this work with setting `.l=list(ID, value1, value2)` - how is this different from accessing the columns with `$ID, $value1, $value2` in advance? Shouldn't this give the same result? – mAI Apr 23 '22 at 08:58
  • 1
    Well, yes. Both approaches will give you the same result. However, when your dataset is grouped then under the hood your data gets split. Using `.l=list(ID, value1, value2)` you map over the splitted data. Using `$ID, $value1, $value2` does not take care of the grouping and you map over the "unsplitted" data. Hence the result is a list of length `nrow(df)` not of the group size. And in that case mutate will complain with something like `must be size 1, not 2.`. Check that by `df %>% ungroup() %>% mutate(...`. – stefan Apr 23 '22 at 09:59
  • thanks for that explanation - i will check on the `group` status! – mAI Apr 23 '22 at 10:31
  • ungrouping my dataframe does the job - thanks man! – mAI Apr 23 '22 at 10:40

1 Answers1

3
library(purrr)
cols <- c("mpg", "cyl", "disp")
mtcars %>% 
    mutate(target = pmap(
        .l = list(!!!rlang::parse_exprs(cols)), 
        .f = function(x, y, z) x + y + z
    ))
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb target
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4    187
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4    187
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  134.8
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  285.4
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  386.7
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  249.1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  382.3
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  175.1
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  167.6

pmap takes lists, while within mutate we pass bare names (mpg not "mpg").

So given a string vector, we parse it using parse_exprs and then create use list and !!! operator to unquote and evaluate these expressions.

AdroMine
  • 1,427
  • 5
  • 9
  • 1
    thanks! that is a clean and nice answer! – mAI Apr 23 '22 at 10:31
  • !!! will "resolve" the list of symbols to a list of actual objects (list containing the columns) within the mutate call so that `pmap` receives a list containing three vectors: `mpg, cyl & disp`. You can read more about it on dplyr's "Programming with dplyr" vignette and on the non-standard evaluation topic – AdroMine Apr 24 '22 at 12:35
  • ok i see, but how does this work when creating the `.l` field manually by using `.l=list(mpg, cyl, disp`). I mean here I don't need something like `.l=list(!!mpg, !!cyl, !!disp)` why? Arent mpg cyl and disp no symbols in this context? – mAI Apr 24 '22 at 12:38
  • That's the NSE (non-standard evaluation) in dplyr that resolves bare symbols to their actual columns. You can read more about it [here](https://dplyr.tidyverse.org/articles/programming.html) – AdroMine Apr 24 '22 at 12:56
  • But why is the NSE Not applied on the result of `parse_exprs` directly ? – mAI Apr 24 '22 at 14:25
  • You can read more details on this [here](https://adv-r.hadley.nz/quasiquotation.html) – AdroMine Apr 24 '22 at 14:38