2

Setup:

I have a tibble (named data) with an embedded list of data.frames.

df1 <- data.frame(name = c("columnName1","columnName2","columnName3"),
                  value = c("yes", 1L, 0L),
                  stringsAsFactors = F)

df2 <- data.frame(name = c("columnName1","columnName2","columnName3"),
                  value = c("no", 1L, 1L),
                  stringsAsFactors = F)

df3 <- data.frame(name = c("columnName1","columnName2","columnName3"),
                  value = c("yes", 0L, 0L),
                  stringsAsFactors = F)

responses = list(df1,
                 df2,
                 df3)

data <- tibble(ids = c(23L, 42L, 84L),
               responses = responses)

Note this is a simplified example of the data. The original data is from a flat json file and loaded with jsonlite::stream_in() function.

Objective:

My goal is to convert this tibble to another tibble where the embedded data.frames are spread (transposed) as columns; for example, my goal tibble is:

goal <- tibble(ids = c(23L, 42L, 84L),
               columnName1 = c("yes","no","yes"),
               columnName2 = c(1L, 1L, 0L),
               columnName3 = c(0L, 1L, 0L))

# goal tibble
> goal
# A tibble: 3 x 4
    ids columnName1 columnName2 columnName3
  <int> <chr>             <int>       <int>
1    23 yes                   1           0
2    42 no                    1           1
3    84 yes                   0           0

My inelegant solution:

Use dplyr::bind_rows() and tidyr::spread():

rdf <- dplyr::bind_rows(data$responses, .id = "id") %>%
  tidyr::spread(key = "name", -id)

goal2 <- cbind(ids = data$ids, rdf[,-1]) %>%
  as.tibble()

Comparing my solution to the goal:

# produced tibble
> goal2
# A tibble: 3 x 4
    ids columnName1 columnName2 columnName3
* <int> <chr>       <chr>       <chr>      
1    23 yes         1           0          
2    42 no          1           1          
3    84 yes         0           0          

Overall, my solution works but has a few problems:

  1. I don't know how to pass the unique ids through bind_rows() which forces me to create a dummy id ("id") which can't match to the original id ("ids"). This forces me to use a cbind() (which I don't like) and manually remove the dummy id (using -1 slicing on rdf).

  2. The format of the columns are lost as my approach converts the integer columns to characters.

Any suggestions on how to improve my solution (especially using tidyverse based packages like tidyjson or tidyr)?

Rhymenoceros
  • 119
  • 1
  • 8
  • 1
    Your data type information is lost from the beginning when you do `c('yes', 0, 0)`. Maybe you can show your json file looks like. There should be better ways to read them in. – Psidom Jan 07 '18 at 17:05

1 Answers1

2

We can loop over the 'responses' column with map, spread it to 'wide' with convert = TRUE so that the column types, create that as a column with transmute and then unnest

library(tidyverse)
data %>% 
     transmute(ids, ind = map(responses, ~.x %>% 
                                  spread(name, value, convert = TRUE)))  %>%
     unnest
# A tibble: 3 x 4
#    ids columnName1 columnName2 columnName3
#   <int> <chr>             <int>       <int>
#1    23 yes                   1           0
#2    42 no                    1           1
#3    84 yes                   0           0

Or using the OP's code, we set the names of the list with 'ids' column, do the bind_rows and then spread

bind_rows(setNames(data$responses, data$ids), .id = 'ids') %>% 
            spread(name, value, convert = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662