4

I have a data frame with a column of nested data frames with 1 or 2 columns and n rows. It looks like df in the sample below:

'data.frame':   3 obs. of  2 variables:
 $ vector:List of 3
  ..$ : chr "p1"
  ..$ : chr "p2"
  ..$ : chr "p3"
 $ lists :List of 3
  ..$ :'data.frame':    2 obs. of  2 variables:
  .. ..$ n1: Factor w/ 2 levels "a","b": 1 2
  .. ..$ n2: Factor w/ 2 levels "1","2": 1 2
  ..$ :'data.frame':    1 obs. of  1 variable:
  .. ..$ n1: Factor w/ 1 level "d": 1
  ..$ :'data.frame':    1 obs. of  2 variables:
  .. ..$ n1: Factor w/ 1 level "e": 1
  .. ..$ n2: Factor w/ 1 level "3": 1

df can be recreated like this :

v <- c("p1", "p2", "p3")
l <- list(data.frame(n1 = c("a", "b"), n2 = c("1", "2")), data.frame(n1 = "d"), data.frame(n1 = "e", n2 = "3"))
df <- as.data.frame(cbind(v, l))

I'd like to transform it to a data frame that looks like that:

[v] [n1] [n2]

p1  a  1

p1  b  2

p2  d  NA

p3  e  3
  • n1 and n2 are in seperate columns
  • if the data frame in row i has n rows, the vector element of row i should be repeated n times
  • if there is no content in n1 or n2, there should be a NA

I've tried using tidyr::unnest but got the following error

 unnest(df)
Error: All nested columns must have the same number of elements.

Does anyone has a better idea how to transform the dataframe in the desired format?

PMH
  • 43
  • 3

3 Answers3

2

Using purrr::pmap_df, within each row of df, we combine v and l into a single data frame and then combine all of the data frames into a single data frame.

library(tidyverse)

pmap_df(df, function(v,l) {
  data.frame(v,l)
})
   v n1   n2
1 p1  a    1
2 p1  b    2
3 p2  d <NA>
4 p3  e    3
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

This will avoid by-row operations, which will be important if you have a lot of rows.

library(data.table)

rbindlist(df$l, fill = T, id = 'row')[, v := df$v[row]][]
#   row n1 n2  v
#1:   1  a  1 p1
#2:   1  b  2 p1
#3:   2  d NA p2
#4:   3  e  3 p3
eddi
  • 49,088
  • 6
  • 104
  • 155
  • That works perfectly fine! One more question on this: How to use this function if there are more than 1 column v with content that should be repeated? – PMH Dec 08 '17 at 22:33
  • maybe `setDT(df); rbindlist(df$l, fill = T, id = 'row')[, c(.SD, df[row, -'l'])]` – eddi Dec 08 '17 at 22:51
0

A solution using dplyr and tidyr. suppressWarnings is not required. Because when you created data frames, there are factor columns, suppressWarnings is to suppress the warning message when combining factors.

library(dplyr)
library(tidyr)

df1 <- suppressWarnings(df %>%
  mutate(v = unlist(.$v)) %>%
  unnest())
df1
#    v n1   n2
# 1 p1  a    1
# 2 p1  b    2
# 3 p2  d <NA>
# 4 p3  e    3
www
  • 38,575
  • 12
  • 48
  • 84