0

Following a web scrape with RCurl, I've used XML's readHTMLTable and now have a list of 100 dataframes with 40 observations of two variables. I would like to convert this to a single dataframe of 100 rows and 40 columns. The first column in each of the dataframes contains what I would like to become column names in a single dataframe. This is as close as I can get to a MWE (each of the dataframes in my actual list are named NULL):

description <- c("name", "location", "age")
value <- c("mike", "florida", "25")
df1 <- data.frame(description, value)
description <- c("name", "location", "tenure")
value <- c("jim", "new york", "5")
df2 <- data.frame(description, value)
list <- list(df1, df2)

# list output
[[1]]
  description   value
1        name    mike
2    location florida
3         age      25

[[2]]
  description    value
1        name      jim
2    location new york
3      tenure        5

Here is the general output I'm hoping to achieve:

library(reshape2)
listm <- melt(list)
dcast(listm, L1 ~ description)
# dcast output
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5

My issue, as mentioned above and for which I don't know how to represent via MWE, is the fact that each dataframe is named NULL, and there is accordingly no unique identifier by which to cast the data.

How can I deal with this issue in reshape2 and/or plyr?

Adam Smith
  • 2,584
  • 2
  • 20
  • 34

1 Answers1

2

You can use rep on the rows of each data.frame in your list to get the L1 column. Then it's straightforward to cast:

# ll is your list of data.frames
ll.df <- cbind(L1 = rep(seq_along(ll), sapply(ll, nrow)), do.call(rbind, ll))

require(reshape2)
dcast(ll.df, L1 ~ description)
  L1  age location name tenure
1  1   25  florida mike   <NA>
2  2 <NA> new york  jim      5
Arun
  • 116,683
  • 26
  • 284
  • 387
  • I'm getting the following error: `Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match`. Any thoughts? Thanks. – Adam Smith Sep 05 '13 at 19:37
  • `sapply(ll, ncol)` helped me identify a malformed dataframe that had 3 columns. I removed it, and your solution worked. Thank you. – Adam Smith Sep 05 '13 at 21:03