6

I'm learning the map function in purrr package and have the following code not working:

library(purrr)
library(dplyr)

df1 = data.frame(type1 = c(rep('a',5),rep('b',5)),
             x = 1:10,
             y = 11:20) 

df1 %>% 
  group_by(type1) %>% 
  nest() %>% 
  map(.$data,with(.x, x + y))

df1 %>% 
  group_by(type1) %>% 
  nest() %>% 
  map(.$data,function(df) df$x + df$y)

For the last two block of code, the errors return as:

Error: Index 1 must have length 1

By contrary, the following two blocks of code work well,

df1 %>% 
  group_by(type1) %>% 
  nest() %>% .$data %>% 
  map(.,~with(.x, .x$x + .x$y))


df1 %>% 
  group_by(type1) %>% 
  nest() %>% .$data %>% 
  map(.,~with(.x, .x$x + .x$y))

Can anyone help me to understand the errors and how to fix them?

rawr
  • 20,481
  • 4
  • 44
  • 78
Jason
  • 1,200
  • 1
  • 10
  • 25
  • Why do you want to `group_by() %>% nest()`? Would using `split()` instead be an option? – Nate Sep 01 '17 at 23:27
  • Did you have a particular use case in mind? Here, it seems like `df1 %>% group_by(type1) %>% mutate(sumxy = x + y)` would be the way to go. – eipi10 Sep 01 '17 at 23:35
  • @eipi10, thanks for your help! The actual function is much more complex than the plus operations here.... – Jason Sep 01 '17 at 23:38
  • @NateDay, thanks for your help! Split is definitely an option and in here I just want to understand how to use map from purrr package – Jason Sep 01 '17 at 23:39

2 Answers2

10

You need to add braces around the map expression, since . doesn't appear as a separate argument placeholder in the function so magrittr pipe is applying the first-argument rule which you can read more about here; and also use ~ to construct a function which is what map is expecting:

df1 %>% 
    group_by(type1) %>% 
    nest() %>% 
    { map(.$data, ~ with(.x, x + y)) }

#[[1]]
#[1] 12 14 16 18 20

#[[2]]
#[1] 22 24 26 28 30

Similarly for the second method:

df1 %>% 
    group_by(type1) %>% 
    nest() %>% 
    { map(.$data,function(df) df$x + df$y) }
#[[1]]
#[1] 12 14 16 18 20

#[[2]]
#[1] 22 24 26 28 30
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for your help and the code works! Can you elaborate on why without a bracket the code doesn't work? .$data is a list and should work with map function. On the other hand, if I replace map function with lapply, the code will also work. Why doesn't the map work the same way as lapply? – Jason Sep 01 '17 at 23:46
  • 2
    Without `{}`, your code is equivalent to `map(., .$data, ~with(.x, x+y))` due to the first argument rule, which is incorrect syntax for `map` function. Besides, how did you get `lapply` work? It seems to me you still need braces for `lapply` to work properly; – Psidom Sep 01 '17 at 23:52
  • Your're right, lapply won't work directly unless I extract data with ".$data" and use another %>% operator. Then how do I assign the first argument to .$data without using curly braces? I tried .x = .$data and it won't work.... – Jason Sep 02 '17 at 00:00
  • I think you'll have to use `{}` to avoid the `%>%` from passing another `.` as argument to `map`. You can't stop it from doing so by using named arguments as long as it doesn't find `.` as an argument, as far as I know. – Psidom Sep 02 '17 at 00:04
  • Ok, I'll just stick to {}, It seems that using "with" achieves the same effect as {}, Is there any difference between {} and 'with' in here? This is the first time I use {} and I want to understand this operator. Thanks! – Jason Sep 02 '17 at 00:09
  • Generally when you have complex expressions with pipes `%>%`, I would suggest sticking with `{}` as it is the documented method and is specifically a syntax used for `magrittr`; And `with` is convenient to work with a simple data frame, so maybe preferred when you are not using `%>%`? – Psidom Sep 02 '17 at 00:14
  • Thanks for your help! The discussions with you have been really helpful! Do you mind tidying up our discussions a little bit and add them to the answers? – Jason Sep 02 '17 at 00:18
3

If you wanted to use split(), I usually split on my grouping factor and then just map an anonymous function for what I want to do for a single tibble/dataframe in the newly created list:

df1 %>% 
    split(.$type1) %>% 
    map(~ mutate(., z = x + y) %>% # chain like you would a single tib
        select(z) %>%
        unlist(T,F))
$a
[1] 12 14 16 18 20

$b
[1] 22 24 26 28 30
Nate
  • 10,361
  • 3
  • 33
  • 40