-1

I have downloaded a set of JSON files from the YELP public data challenge found here: https://www.yelp.com/dataset/challenge

They provide NDJSON formatted files. I've been able to read them using

library(jsonlite)
df <- stream_in(file("file_path"))

Unfortunately there are still attribute columns that seem to be nested data.frames that I cannot parse out without very manually creating new columns.

Example:

df$attributes$BusinessParking is a character column containing:

{'garage': False, 'street': True, 'validated': False, 'lot': False, 'valet': False}

There are NA values in this column. I'd like to be able to parse this out into 5 binary columns. Is there a way to do this that I'm missing? I'm new to R but I've done some digging and haven't come across any solutions.

Artem
  • 3,304
  • 3
  • 18
  • 41
Jeremiah
  • 55
  • 6
  • 1
    We don't have access to the data. The link you provide requires us to register with a name & email address which I imagine few people would bother with just to answer a question on SO. I actually did put in my details, but then realised that the JSON file is 3.13 GB in size. Bottom line is you're not making it easy for people to help. You should post a representative sample of the data so that we have something manageable to work with. – Maurits Evers Sep 11 '18 at 22:56
  • 1
    see https://stackoverflow.com/questions/16947643/getting-imported-json-data-into-a-data-frame – fishtank Sep 11 '18 at 23:30

1 Answers1

0

You can just reassing colums using $ accessor operator. Based on Maurits Evers comment about sample size ~ 3 Gb, I created an example based on a data sample available Yelp Dataset JSON, business.json (see at the end of the post). Additionally you will need to concatenate Categories using paste0 into character vector to avoid multi-line record for each JSON entity.

yelp.R

library(jsonlite)

df <- jsonlite::fromJSON("business.json")

df$RestaurantsTakeOut <- df$attributes$RestaurantsTakeOut      
df_bp <- df$attributes$BusinessParking      
df_wh <- df$hours
df <- cbind(df, df_bp, df_wh)
df$categories <- sapply(df$categories, paste0, collapse = ", ")
df$attributes <- NULL
df$hours <- NULL

str(df)

Output:

'data.frame':   2 obs. of  26 variables:
 $ business_id       : chr  "tnhfDv5Il8EaGSXZGiuQGg" "tnhfDv5Il8EaGSXZGiuQGg"
 $ name              : chr  "Garaje" "Garaje"
 $ neighborhood      : chr  "SoMa" "SoMa"
 $ address           : chr  "475 3rd St" "475 3rd St"
 $ city              : chr  "San Francisco" "San Francisco"
 $ state             : chr  "CA" "CA"
 $ postal code       : chr  "94107" "94107"
 $ latitude          : num  37.8 37.8
 $ longitude         : num  -122 -122
 $ stars             : num  4.5 4.5
 $ review_count      : int  1198 1198
 $ is_open           : int  1 1
 $ categories        : chr  "Mexican, Burgers, Gastropubs" "Mexican, Burgers, Gastropubs"
 $ RestaurantsTakeOut: logi  TRUE TRUE
 $ garage            : logi  FALSE FALSE
 $ street            : logi  TRUE TRUE
 $ validated         : logi  FALSE FALSE
 $ lot               : logi  FALSE FALSE
 $ valet             : logi  FALSE FALSE
 $ Monday            : chr  "10:00-21:00" "10:00-21:00"
 $ Tuesday           : chr  "10:00-21:00" "10:00-21:00"
 $ Friday            : chr  "10:00-21:00" "10:00-21:00"
 $ Wednesday         : chr  "10:00-21:00" "10:00-21:00"
 $ Thursday          : chr  "10:00-21:00" "10:00-21:00"
 $ Sunday            : chr  "11:00-18:00" "11:00-18:00"
 $ Saturday          : chr  "10:00-21:00" "10:00-21:00"

business.json

[{
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

    "name": "Garaje",

    "neighborhood": "SoMa",

    "address": "475 3rd St",

    "city": "San Francisco",

    "state": "CA",

    "postal code": "94107",

    "latitude": 37.7817529521,

    "longitude": -122.39612197,

    "stars": 4.5,

    "review_count": 1198,

    "is_open": 1,

    "attributes": {
        "RestaurantsTakeOut": true,
        "BusinessParking": {
            "garage": false,
            "street": true,
            "validated": false,
            "lot": false,
            "valet": false
        }
    },

    "categories": [
        "Mexican",
        "Burgers",
        "Gastropubs"
    ],

    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}, 
{
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

    "name": "Garaje",

    "neighborhood": "SoMa",

    "address": "475 3rd St",

    "city": "San Francisco",

    "state": "CA",

    "postal code": "94107",

    "latitude": 37.7817529521,

    "longitude": -122.39612197,

    "stars": 4.5,

    "review_count": 1198,

    "is_open": 1,

    "attributes": {
        "RestaurantsTakeOut": true,
        "BusinessParking": {
            "garage": false,
            "street": true,
            "validated": false,
            "lot": false,
            "valet": false
        }
    },

    "categories": [
        "Mexican",
        "Burgers",
        "Gastropubs"
    ],

    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}]
Artem
  • 3,304
  • 3
  • 18
  • 41