Why does using pipes and map fail on a list of data frames?

Question

I have tibbles nested in a list with an identifier column as well. I would like to run anonymous functions on each nested tibble. However, when I use a pipe to refer to my main df and then to the list that contains my data map does not work.

# Creating the df
df_nested <- iris %>% group_by(Species) %>% nest()

# Does not work
# df_nested %>% 
# map(data, nrow)

# Works
map(df_nested$data, nrow)

I would like to understand why doesn't the code work with using a pipe.

score 2 · Accepted Answer · answered Jun 01 '19 at 12:04

That is because when using pipes (%>%) the first argument is passed by default from LHS.

When you are doing

df_nested %>% map(data, nrow)

you get

#$Species
#[1] ".x[[i]]" "nrow"   

#$data
#[1] ".x[[i]]" "nrow"   

#Warning messages:
#1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
#2: In .f(.x[[i]], ...) : data set ‘nrow’ not found
#3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
#4: In .f(.x[[i]], ...) : data set ‘nrow’ not found

which is same as

map(df_nested, data, nrow)

If you want to use pipes you might need

df_nested$data %>% map(nrow)

#[[1]]
#[1] 50

#[[2]]
#[1] 50

#[[3]]
#[1] 50

score 2 · Answer 2 · answered Jun 01 '19 at 13:00

It's always better to use mutate when using nested data:

df_nested %>% 
   mutate(Nrow=map(data,nrow)) %>% 
   unnest(Nrow)
# A tibble: 3 x 3
  Species    data               Nrow
  <fct>      <list>            <int>
1 setosa     <tibble [50 x 4]>    50
2 versicolor <tibble [50 x 4]>    50
3 virginica  <tibble [50 x 4]>    50

Why does using pipes and map fail on a list of data frames?

2 Answers2