Melt data.frame containing list to long format (efficiently)

Question

I would like to convert an R data.frame containing a list of uneven length to long format as in the example below.

testdf <- data.frame(a=1:3,b=I(list(letters[1:2],letters[2:7],letters[1:3])))
tesdf 
  a            b
1 1         a, b
2 2 b, c, d,....
3 3      a, b, c

the resulting format should look like this

A for-loop-way to achieve this would look something like this

resultdf <- data.frame()
for(i in as.numeric(row.names(testdf))){resultdf <- rbind(resultdf, data.frame(a=testdf$a[i],b=unlist(testdf$b[i][[1]])))}

however this proves to be very slow and I need to to this for a large data.frame (~6mio rows and an average length of list of 10 items). I tried melt as follows

library(reshape2)
>  melt(test, id.var="a", value.var="b")
Error: Can't melt data.frames with non-atomic 'measure' columns

but I'm not even sure if melt is meant to work with lists. Which would be the fastest was to do this?

btw.: I produce the initial data.frame by using str_extract_all()

`library(data.table) ; setDT(testdf)[, .(b = unlist(b)), by = a]` or `library(splitstackshape) ; res <- listCol_l(testdf, "b")` — David Arenburg, Aug 25 '15 at 08:22
thanks a lot! this is it. and its incredibly fast! I'll go for the data.table library. can you post this as an answer to my question? — supersambo, Aug 25 '15 at 08:33
I already closed it as a dupe. These answers already appear there. If you have additional columns in your data I would suggest the `splitstackshape` package rout (it uses `data.table` under the hood). Still not sure why `tidyr::unnest` doesn't work here, but don't care really. — David Arenburg, Aug 25 '15 at 08:35
@DavidArenburg, not quite sure what you mean. `tidyr::unnest(testdf, b)` works perfectly fine for me.. — talat, Aug 25 '15 at 09:17
@docendodiscimus I've tried `tidyr::unnest(testdf)` as per the dupe and it didn't work. Not sure when you need to specify the column and when you shouldn't — David Arenburg, Aug 25 '15 at 09:23
@DavidArenburg, the equivalent of the dupe would be `testdf %>% unnest(b)`. It's in the documentation — talat, Aug 25 '15 at 09:27
@docendodiscimus oh right, I havn't noticed that the column was actually specified there because of the unnecessary pipe — David Arenburg, Aug 25 '15 at 09:29

Melt data.frame containing list to long format (efficiently)

0 Answers0

Linked