Setup:
I have a tibble (named data) with an embedded list of data.frames.
df1 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("yes", 1L, 0L),
stringsAsFactors = F)
df2 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("no", 1L, 1L),
stringsAsFactors = F)
df3 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("yes", 0L, 0L),
stringsAsFactors = F)
responses = list(df1,
df2,
df3)
data <- tibble(ids = c(23L, 42L, 84L),
responses = responses)
Note this is a simplified example of the data. The original data is from a flat json file and loaded with jsonlite::stream_in()
function.
Objective:
My goal is to convert this tibble to another tibble where the embedded data.frames are spread (transposed) as columns; for example, my goal tibble is:
goal <- tibble(ids = c(23L, 42L, 84L),
columnName1 = c("yes","no","yes"),
columnName2 = c(1L, 1L, 0L),
columnName3 = c(0L, 1L, 0L))
# goal tibble
> goal
# A tibble: 3 x 4
ids columnName1 columnName2 columnName3
<int> <chr> <int> <int>
1 23 yes 1 0
2 42 no 1 1
3 84 yes 0 0
My inelegant solution:
Use dplyr::bind_rows()
and tidyr::spread()
:
rdf <- dplyr::bind_rows(data$responses, .id = "id") %>%
tidyr::spread(key = "name", -id)
goal2 <- cbind(ids = data$ids, rdf[,-1]) %>%
as.tibble()
Comparing my solution to the goal:
# produced tibble
> goal2
# A tibble: 3 x 4
ids columnName1 columnName2 columnName3
* <int> <chr> <chr> <chr>
1 23 yes 1 0
2 42 no 1 1
3 84 yes 0 0
Overall, my solution works but has a few problems:
I don't know how to pass the unique ids through
bind_rows()
which forces me to create a dummy id ("id") which can't match to the original id ("ids"). This forces me to use acbind()
(which I don't like) and manually remove the dummy id (using -1 slicing onrdf
).The format of the columns are lost as my approach converts the integer columns to characters.
Any suggestions on how to improve my solution (especially using tidyverse based packages like tidyjson or tidyr)?