4

Here's a dumb example dataframe:

df <- data_frame(A = c(rep(1, 5), rep(2, 4)), B = 1:9) %>% 
  group_by(A) %>% 
  nest()

which looks like this:

> df
# A tibble: 2 × 2
      A             data
  <dbl>           <list>
1     1 <tibble [5 × 1]>
2     2 <tibble [4 × 1]>

I would like to add a third column called N with entries equal to the number of rows in each nested data_frame in data. I figured this would work:

> df %>% 
+   mutate(N = nrow(data))
Error: Unsupported type NILSXP for column "N"

What's going wrong?

crf
  • 1,810
  • 3
  • 15
  • 23
  • 1
    Try this: `df$nRow <- sapply(df$data, nrow)` instead. You need this done one cell at a time. Can't do `nrow` on the whole column of data frames. – Gopala May 04 '17 at 16:02

3 Answers3

5

Combining dplyr and purrr you could do:

library(tidyverse)

df %>% 
  mutate(n = map_dbl(data, nrow))
#> # A tibble: 2 × 3
#>       A             data     n
#>   <dbl>           <list> <dbl>
#> 1     1 <tibble [5 × 1]>     5
#> 2     2 <tibble [4 × 1]>     4

I like this approach, because you stay within your usual workflow, creating a new column within mutate, but leveraging the map_*-family, since you need to operate on a list.

Thomas K
  • 3,242
  • 15
  • 29
2

You could do:

df %>%
  rowwise() %>%
  mutate(N = nrow(data))

Which gives:

#Source: local data frame [2 x 3]
#Groups: <by row>
#
## A tibble: 2 × 3
#      A             data     N
#  <dbl>           <list> <int>
#1     1 <tibble [5 × 1]>     5
#2     2 <tibble [4 × 1]>     4
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • 1
    Very nice solution within the `dplyr` solution compared to the `apply` family I mentioned above. I like this a lot. – Gopala May 04 '17 at 16:30
1

With dplyr:

df %>% 
  group_by(A) %>%
  mutate(N = nrow(data.frame(data)))
      A             data     N
  <dbl>           <list> <int>
1     1 <tibble [5 × 1]>     5
2     2 <tibble [4 × 1]>     4
eipi10
  • 91,525
  • 24
  • 209
  • 285