4

I have tibbles nested in a list with an identifier column as well. I would like to run anonymous functions on each nested tibble. However, when I use a pipe to refer to my main df and then to the list that contains my data map does not work.

# Creating the df
df_nested <- iris %>% group_by(Species) %>% nest()

# Does not work
# df_nested %>% 
# map(data, nrow)

# Works
map(df_nested$data, nrow)

I would like to understand why doesn't the code work with using a pipe.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57

2 Answers2

2

That is because when using pipes (%>%) the first argument is passed by default from LHS.

When you are doing

df_nested %>% map(data, nrow)

you get

#$Species
#[1] ".x[[i]]" "nrow"   

#$data
#[1] ".x[[i]]" "nrow"   

#Warning messages:
#1: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
#2: In .f(.x[[i]], ...) : data set ‘nrow’ not found
#3: In .f(.x[[i]], ...) : data set ‘.x[[i]]’ not found
#4: In .f(.x[[i]], ...) : data set ‘nrow’ not found

which is same as

map(df_nested, data, nrow)

If you want to use pipes you might need

df_nested$data %>% map(nrow)

#[[1]]
#[1] 50

#[[2]]
#[1] 50

#[[3]]
#[1] 50
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

It's always better to use mutate when using nested data:

df_nested %>% 
   mutate(Nrow=map(data,nrow)) %>% 
   unnest(Nrow)
# A tibble: 3 x 3
  Species    data               Nrow
  <fct>      <list>            <int>
1 setosa     <tibble [50 x 4]>    50
2 versicolor <tibble [50 x 4]>    50
3 virginica  <tibble [50 x 4]>    50
NelsonGon
  • 13,015
  • 7
  • 27
  • 57