Transforming a nested data frame with varying number of elements

Question

I have a data frame with a column of nested data frames with 1 or 2 columns and n rows. It looks like df in the sample below:

'data.frame':   3 obs. of  2 variables:
 $ vector:List of 3
  ..$ : chr "p1"
  ..$ : chr "p2"
  ..$ : chr "p3"
 $ lists :List of 3
  ..$ :'data.frame':    2 obs. of  2 variables:
  .. ..$ n1: Factor w/ 2 levels "a","b": 1 2
  .. ..$ n2: Factor w/ 2 levels "1","2": 1 2
  ..$ :'data.frame':    1 obs. of  1 variable:
  .. ..$ n1: Factor w/ 1 level "d": 1
  ..$ :'data.frame':    1 obs. of  2 variables:
  .. ..$ n1: Factor w/ 1 level "e": 1
  .. ..$ n2: Factor w/ 1 level "3": 1

df can be recreated like this :

v <- c("p1", "p2", "p3")
l <- list(data.frame(n1 = c("a", "b"), n2 = c("1", "2")), data.frame(n1 = "d"), data.frame(n1 = "e", n2 = "3"))
df <- as.data.frame(cbind(v, l))

I'd like to transform it to a data frame that looks like that:

[v] [n1] [n2]

p1  a  1

p1  b  2

p2  d  NA

p3  e  3

n1 and n2 are in seperate columns
if the data frame in row i has n rows, the vector element of row i should be repeated n times
if there is no content in n1 or n2, there should be a NA

I've tried using tidyr::unnest but got the following error

 unnest(df)
Error: All nested columns must have the same number of elements.

Does anyone has a better idea how to transform the dataframe in the desired format?

eipi10 · Answer 1 · 2017-12-08T21:43:18.137

2

Using purrr::pmap_df, within each row of df, we combine v and l into a single data frame and then combine all of the data frames into a single data frame.

library(tidyverse)

pmap_df(df, function(v,l) {
  data.frame(v,l)
})

   v n1   n2
1 p1  a    1
2 p1  b    2
3 p2  d <NA>
4 p3  e    3

edited Dec 08 '17 at 21:43

answered Dec 08 '17 at 21:25

eipi10

91,525
24
209
285

score 1 · Accepted Answer · answered Dec 08 '17 at 21:35

1

This will avoid by-row operations, which will be important if you have a lot of rows.

library(data.table)

rbindlist(df$l, fill = T, id = 'row')[, v := df$v[row]][]
#   row n1 n2  v
#1:   1  a  1 p1
#2:   1  b  2 p1
#3:   2  d NA p2
#4:   3  e  3 p3

answered Dec 08 '17 at 21:35

eddi

49,088
6
104
155

That works perfectly fine! One more question on this: How to use this function if there are more than 1 column v with content that should be repeated? – PMH Dec 08 '17 at 22:33
maybe `setDT(df); rbindlist(df$l, fill = T, id = 'row')[, c(.SD, df[row, -'l'])]` – eddi Dec 08 '17 at 22:51

score 0 · Answer 3 · answered Dec 08 '17 at 21:27

A solution using dplyr and tidyr. suppressWarnings is not required. Because when you created data frames, there are factor columns, suppressWarnings is to suppress the warning message when combining factors.

library(dplyr)
library(tidyr)

df1 <- suppressWarnings(df %>%
  mutate(v = unlist(.$v)) %>%
  unnest())
df1
#    v n1   n2
# 1 p1  a    1
# 2 p1  b    2
# 3 p2  d <NA>
# 4 p3  e    3

Transforming a nested data frame with varying number of elements

3 Answers3

Linked