14

I'm trying to better understand functional programming in R. I'd like to stick to purrr, but I'll use rapply to demonstrate what I'm looking for below. First, a simple example of what I'm trying to understand:

You can use map to get the mean of each column of the mtcars dataset:

library(tidyverse)
mtcars %>% map_dbl(mean)

   mpg        cyl       disp         hp       drat         wt       qsec  
 20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
    vs         am       gear       carb 
 0.437500   0.406250   3.687500   2.812500 

But how would I use purrr to map mean to mtcars split by cyl?

library(tidyverse)
mtcars_split <- mtcars %>% split(.$cyl) 
mtcars_split %>% map(mean)
$`4`
[1] NA

$`6`
[1] NA

$`8`
[1] NA

Warning messages:
1: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(.x[[i]], ...) :
  argument is not numeric or logical: returning NA

I understand why this doesn't work: split creates a list and now I'm trying to map mean to each element of that new list, which are data.frames. This attempt at mapping is equivalent to (correct me if necessary):

mean(mtcars_split[1])
mean(mtcars_split[2])
mean(mtcars_split[3])

which obviously doesn't work - you can't just take the mean of a data.frame. What I really want is something that does this:

mtcars_split[[1]] %>% map(mean)
mtcars_split[[2]] %>% map(mean)
mtcars_split[[3]] %>% map(mean)

The problem is, I just can't wrap my head around how to do this in purrr. While looking for the solution to this (seemingly very basic) problem, I found rapply, which seems to do what I want, but outside of purrr (and the output isn't exactly in the format I'd like, but that's beside the point):

rapply(mtcars_split, mean, how = "unlist")
      4.mpg       4.cyl      4.disp        4.hp      4.drat        4.wt 
 26.6636364   4.0000000 105.1363636  82.6363636   4.0709091   2.2857273 
     4.qsec        4.vs        4.am      4.gear      4.carb       6.mpg 
 19.1372727   0.9090909   0.7272727   4.0909091   1.5454545  19.7428571 
  6.cyl      6.disp        6.hp      6.drat        6.wt      6.qsec 
  6.0000000 183.3142857 122.2857143   3.5857143   3.1171429  17.9771429 
       6.vs        6.am      6.gear      6.carb       8.mpg       8.cyl 
  0.5714286   0.4285714   3.8571429   3.4285714  15.1000000   8.0000000 
     8.disp        8.hp      8.drat        8.wt      8.qsec        8.vs 
353.1000000 209.2142857   3.2292857   3.9992143  16.7721429   0.0000000 
       8.am      8.gear      8.carb 
  0.1428571   3.2857143   3.5000000 

rapply being recursive apply is obviously a key to my answer - I believe I need nested maps - one to extract each column of the three data.frames in my mtcars_split, then one to run mean on each extracted column. However, I haven't been able to make that work.

I think this is addressed by Jenny Bryan in her purrr tutorial where she uses a map() inside a map(), but I can't follow what she is doing. She notes that the example might not be explained adequately earlier in the tutorial and I've asked her for elaboration here, but no answer yet (I know she is busy!).

twgardner2
  • 630
  • 1
  • 8
  • 27
  • You can use `group_by` of **dplyr**. – F. Privé Sep 24 '17 at 16:42
  • 2
    `mtcars %>% split(.$cyl) %>% map(map, mean) ` – missuse Sep 24 '17 at 16:56
  • 3
    If you're starting with data.frames, you should only really use purrr to operate within a list column; otherwise there's always a better way to do it with dplyr and tidyr, e.g. `mtcars %>% group_by(cyl) %>% summarise_all(mean)`. However, there_are_ times you want to iterate across the second level of a nested list, at which point you can use `modify_depth`. Putting `map` inside `map` is possible, but if you're using the abbreviated `~`/`.x`-style anonymous functions, it can get confusing. – alistaire Sep 24 '17 at 19:40
  • @alistaire Yes, I'm comfortable doing this with dplyr on data.frames. I'm asking the question, just as you mention, for working on lower levels of nested lists. I used mtcars for simplicity of the example. I will check out modify_depth. Thanks for your comment. – twgardner2 Sep 24 '17 at 20:03
  • The problem is that `mtcars` _isn't_ a nested list; it's a list of depth 1. – alistaire Sep 24 '17 at 20:04
  • 1
    Sure, but after you put it through split, it becomes a list of three data.frames. It may not be a perfect example, but it demonstrates what I'm trying to understand. – twgardner2 Sep 24 '17 at 20:07
  • You can do `mtcars %>% split(.$cyl) %>% map_df(map, mean)`, but it's a more verbose and less efficient idiom for rectangular data than keeping it rectangular. – alistaire Sep 24 '17 at 20:14
  • 1
    Yes, I understand that. I'm not asking this so that I can work on dataframes. Like I said above, I'm more than comfortable doing this using dplyr on a rectangular data structure. mtcars is merely to demonstrate the question. I'm asking to understand how this works on an actual nested list. Rather than coerce what I'm working on into a minimal working example, I'm asking how to map to the nested dataframes you get after splitting mtcars on cyl. Thank you for pointing me toward modify_depth. I'll check that out. – twgardner2 Sep 24 '17 at 20:22
  • What is the desired output format? `mtcars_split %>% map(~summarise_all(.x, mean))` or `mtcars_split %>% modify_depth(2, mean)` might be what you want. – IceCreamToucan Jul 27 '18 at 14:19

1 Answers1

44

The recipe for this kind of problem is always the same:

Decompose the problem, solve it for an individual case, and then put it back together inside out.

As you observed, mtcars %>% split(.$cyl) gives you a list of lists (list of data.frames). You want to map mean over the inner lists.

So let’s do it for one list first:

mtcars_split[[1]] %>% map_dbl(mean)
# Or, equivalently:
map_dbl(mtcars_split[[1]], mean)

This works. We’ve decomposed the problem and successfully solved it for an individual case: In other words, given a list x and a transformation f, we’ve solved the problem for x[[1]] by executing f(x[[1]]) (which is equivalent to x[[1]] %>% f()).

Time to generalise it to all cases. And we already know how to generalise a transformation of an element x[[1]] to a whole list x: use map on that list:

x %>% map(~ .x %>% f())
# or, equivalently:
x %>% map(~ f(.x))
# or, equivalently:
map(x, ~ f(.x))
# or, finally:
map(x, f)

Let’s do the exact same thing, with x and f substituted by mtcars_split and map_dbl(mean), respectively:

mtcars_split %>% map(~ .x %>% map_dbl(mean))
# or, equivalently:
mtcars_split %>% map(~ map_dbl(.x, mean))

And this can be simplified the same way as our example above:

mtcars_split %>% map(map_dbl, mean)
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thank you for this explanation @Konrad, I've been trying to nest maps and it really helps to break it down like this. Would it please be possible to elaborate on how `map(map_dbl, mean)` is equivalent from `map(~ map_dbl(.x, mean))`? I'm not sure I understand it. – user51462 Sep 09 '19 at 06:50
  • 3
    @user51462 It’s equivalent only because the purrr package functions explicitly support both. They are not otherwise equivalent. purrr’s `map` function examines the first argument type to determine what to do. If it’s a function (as in the second example), it’s defined so that `map(xs, f, args)` is the same as `map(xs, function (x) f(x, args))`. If the first argument is a formula, then it transforms the formula into an equivalent function call, i.e. `map(xs, ~ f(.x, args))` is once again the same. – Konrad Rudolph Sep 09 '19 at 13:40