3

I'm trying to better understand how pmap() works within dataframes, and I get a surprising result when applying pmap() to compute means from several columns.

mtcars %>% 
  mutate(comp_var = pmap_dbl(list(vs, am, cyl), mean)) %>% 
  select(comp_var, vs, am, cyl)

In the above example, comp_var is equal to the value of vs in its row, rather than the mean of the three variables in a given row.

I know that I could get accurate results for comp_var using ...

mtcars %>% 
  rowwise() %>% 
    mutate(comp_var = mean(c(vs, am, cyl))) %>% 
    select(comp_var, vs, am, cyl) %>% 
  ungroup()

... but I want to understand how pmap() should be applied in a case like this.

Joe
  • 3,217
  • 3
  • 21
  • 37

1 Answers1

8

We need to concatenate the argument for the x parameter in mean as

x: An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for ‘trim = 0’, only.

So, if we pass argument like x1, x2, x3, etc, it will be going into the ... parameter based on the usage

mean(x, ...)

For e.g.

mean(5, 8) # x is 5
#[1] 5 
mean(8, 5) # x is 8
#[1] 8
mean(c(5, 8)) # x is a vector with 2 values
#[1] 6.5

In the rowwise function, the OP concatenated the elements to a single vector while with pmap it is left as such for mean to apply on the first argument

out1 <- mtcars %>% 
         mutate(comp_var = pmap_dbl(list(vs, am, cyl), ~mean(c(...)))) %>% 
         dplyr::select(comp_var, vs, am, cyl)

-checking with the rowwise output

out2 <- mtcars %>% 
         rowwise() %>% 
         mutate(comp_var = mean(c(vs, am, cyl))) %>% 
         dplyr::select(comp_var, vs, am, cyl) %>% 
         ungroup()

all.equal(out1, out2, check.attributes = FALSE)
#[1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662