Creating Dataframe from a json file

Question

I want to create a proper data frame reading from a json file. I am able to view the created data frame properly, but dplyr function group_by does not work on it. It is probably because when I do the str() of the data frame created it gives every column as a list of strings as opposed to a vector of strings. I am trying the following:

    require(jsonlite)

    train_file = 'train.json'

    train_data <- fromJSON(train_file)

    rb = data.frame(sapply(train_data,c), stringsAsFactors = FALSE)

    rbs = rb %>% slice(1:10)

    rbsg = rbs %>%
      group_by(colname)

This gives the following error:

Error: cannot group column colname, of class 'list'

Very specifically, the file that I am trying to read is the train.json file in this kaggle competition:

https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data

Yes, json file is deeply nested. I am giving tidyjson a look. i tried jsonlite, JSONRIO, and many others. All were leading to the same problem. — Arpit Goel, Feb 09 '17 at 23:10

score 1 · Accepted Answer · answered Feb 10 '17 at 11:48

You need to unnest() the column of interest before operating on it (e.g. before using group_by() or other dplyr verbs):

library(jsonlite)
library(tidyverse)

rbs <- fromJSON("train.json") %>%
  bind_rows()

rbsg <- rbs %>%
  unnest(bedrooms) %>%
  group_by(bedrooms)

rbs_filtered <- rbs %>%
  unnest(bathrooms) %>%
  filter(bathrooms > 5)

Creating Dataframe from a json file

1 Answers1