0

I'm trying to read in a large JSON file (stored as .txt file) returned by the Yelp API, and convert it into a data frame. My JSON file is in "pretty print" format, the first 3 JSON objects are below:

{
    "businesses": [
        {
            "address1": "11301 Wilshire Blvd", 
            "address2": "", 
            "address3": "", 
            "avg_rating": 3.0, 
            "categories": [
                {
                    "category_filter": "hospitals", 
                    "name": "Hospitals", 
                    "search_url": "http://www.yelp.com/search?cflt=hospitals&find_desc=&find_loc=11301+Wilshire+Blvd%2C+Los+Angeles+90073"
                }
            ], 
            "city": "Los Angeles", 
            "country": "USA", 
            "country_code": "US", 
            "distance": 0.0, 
            "id": "9yWDlJ5l1i6O36Fxp5JIBw", 
            "is_closed": false, 
            "mobile_url": "http://m.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2", 
            "name": "West Los Angeles Medical Center", 
            "nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire+Blvd%2C+Los+Angeles+90073", 
            "neighborhoods": [], 
            "phone": "3104783711", 
            "photo_url": "http://media2.fl.yelpcdn.com/bpthumb/IZ82DgJAy8emp4dX7UvbUw/ms", 
            "photo_url_small": "http://media2.fl.yelpcdn.com/bpthumb/IZ82DgJAy8emp4dX7UvbUw/ss", 
            "rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png", 
            "rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png", 
            "review_count": 40, 
            "reviews": [
                {
                    "date": "2014-11-25", 
                    "id": "wO4jShjiPoWDBR_OV3cGmQ", 
                    "mobile_uri": "/biz/west-los-angeles-medical-center-los-angeles-2?full=True&hrid=wO4jShjiPoWDBR_OV3cGmQ", 
                    "rating": 5, 
                    "rating_img_url": "http://s3-media1.fl.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png", 
                    "rating_img_url_small": "http://s3-media1.fl.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png", 
                    "text_excerpt": "I dropped my super expensive insurance, that is not feasible to afford now, and joined the VA.  Shortly after signing up it was found that I needed surgery....", 
                    "url": "http://www.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2?hrid=wO4jShjiPoWDBR_OV3cGmQ", 
                    "user_name": "Dj D.", 
                    "user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/s1RKYlrzhKCZs_zSS0cVOA/ms", 
                    "user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/s1RKYlrzhKCZs_zSS0cVOA/ss", 
                    "user_url": "http://www.yelp.com/user_details?userid=cJyDfLw9uJT63MwFgz7XnA"
                }
            ], 
            "state": "CA", 
            "state_code": "CA", 
            "url": "http://www.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2", 
            "zip": "90073"
        }, 
        {
            "address1": "11301 Wilshire", 
            "address2": "Bldg 306", 
            "address3": "", 
            "avg_rating": 3.0, 
            "categories": [
                {
                    "category_filter": "cafeteria", 
                    "name": "Cafeteria", 
                    "search_url": "http://www.yelp.com/search?cflt=cafeteria&find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073"
                }
            ], 
            "city": "Los Angeles", 
            "country": "USA", 
            "country_code": "US", 
            "distance": 0.0, 
            "id": "K8eEx2J3pF3b-w6EZwKY5w", 
            "is_closed": false, 
            "mobile_url": "http://m.yelp.com/biz/va-canteen-wla-los-angeles", 
            "name": "VA Canteen WLA", 
            "nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073", 
            "neighborhoods": [], 
            "phone": "3104783711", 
            "photo_url": "http://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_styleguide/5f69f303f17c/assets/img/default_avatars/business_medium_square.png", 
            "photo_url_small": "http://s3-media3.fl.yelpcdn.com/assets/srv0/yelp_styleguide/6671667140ef/assets/img/default_avatars/business_small_square.png", 
            "rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png", 
            "rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png", 
            "review_count": 4, 
            "reviews": [
                {
                    "date": "2014-11-02", 
                    "id": "rzoQx7o9sla7ig3QZAjtUg", 
                    "mobile_uri": "/biz/va-canteen-wla-los-angeles?full=True&hrid=rzoQx7o9sla7ig3QZAjtUg", 
                    "rating": 3, 
                    "rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png", 
                    "rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png", 
                    "text_excerpt": "This place serves its function. There are a few stations where you can grab food if you don't want to venture off the VA premises for lunch. However, the...", 
                    "url": "http://www.yelp.com/biz/va-canteen-wla-los-angeles?hrid=rzoQx7o9sla7ig3QZAjtUg", 
                    "user_name": "James W.", 
                    "user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/6UlTMXf0VkFmmmXwXe8Flg/ms", 
                    "user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/6UlTMXf0VkFmmmXwXe8Flg/ss", 
                    "user_url": "http://www.yelp.com/user_details?userid=qgXcgfdrk5tzmLBq4_h6mQ"
                }
            ], 
            "state": "CA", 
            "state_code": "CA", 
            "url": "http://www.yelp.com/biz/va-canteen-wla-los-angeles", 
            "zip": "90073"
        }, 
        {
            "address1": "11301 Wilshire", 
            "address2": "Bldg 306", 
            "address3": "", 
            "avg_rating": 2.0, 
            "categories": [
                {
                    "category_filter": "cafeteria", 
                    "name": "Cafeteria", 
                    "search_url": "http://www.yelp.com/search?cflt=cafeteria&find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073"
                }
            ], 
            "city": "Los Angeles", 
            "country": "USA", 
            "country_code": "US", 
            "distance": 0.0, 
            "id": "4etl04G_-VwP8NJ2F3nu4w", 
            "is_closed": false, 
            "mobile_url": "http://m.yelp.com/biz/va-canteen-wla-2-los-angeles", 
            "name": "VA Canteen WLA 2", 
            "nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073", 
            "neighborhoods": [], 
            "phone": "3104783711", 
            "photo_url": "http://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_styleguide/5f69f303f17c/assets/img/default_avatars/business_medium_square.png", 
            "photo_url_small": "http://s3-media3.fl.yelpcdn.com/assets/srv0/yelp_styleguide/6671667140ef/assets/img/default_avatars/business_small_square.png", 
            "rating_img_url": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/b561c24f8341/ico/stars/v1/stars_2.png", 
            "rating_img_url_small": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/a6210baec261/ico/stars/v1/stars_small_2.png", 
            "review_count": 1, 
            "reviews": [
                {
                    "date": "2014-02-01", 
                    "id": "G9Qr5OpQHs0qo89LFzYIGA", 
                    "mobile_uri": "/biz/va-canteen-wla-2-los-angeles?full=True&hrid=G9Qr5OpQHs0qo89LFzYIGA", 
                    "rating": 2, 
                    "rating_img_url": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/b561c24f8341/ico/stars/v1/stars_2.png", 
                    "rating_img_url_small": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/a6210baec261/ico/stars/v1/stars_small_2.png", 
                    "text_excerpt": "Mon-fri 7am to 1:30 pm\n\nBreakfast and Lunch\n\nThe grill team is good, Grill masters!\n\nCoffee is ok", 
                    "url": "http://www.yelp.com/biz/va-canteen-wla-2-los-angeles?hrid=G9Qr5OpQHs0qo89LFzYIGA", 
                    "user_name": "Patrick D.", 
                    "user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/essl4VxDB599GHCamIdDdA/ms", 
                    "user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/essl4VxDB599GHCamIdDdA/ss", 
                    "user_url": "http://www.yelp.com/user_details?userid=B7VkaAqckBslmw5HtstA1A"
                }
            ], 
            "state": "CA", 
            "state_code": "CA", 
            "url": "http://www.yelp.com/biz/va-canteen-wla-2-los-angeles", 
            "zip": "90073"
        }
    ], 
    "message": {
        "code": 0, 
        "text": "OK", 
        "version": "1.1.1"
    }
}
{
    "businesses": [], 
    "message": {
        "code": 0, 
        "text": "OK", 
        "version": "1.1.1"
    }
}
{
    "businesses": [], 
    "message": {
        "code": 0, 
        "text": "OK", 
        "version": "1.1.1"
    }
} 

I've tried the following R code:

library(dplyr)
library(plyr)
library(jsonlite) 

df <- fromJSON(paste(readLines("Yelp facility pretty print v2.txt"), collapse="")) 

But this only returns the first JSON object.

I then tried:

df <- fromJSON(sprintf("[%s]", paste(readLines("Yelp facility pretty print v2.txt"), collapse=",")))

But this returns an error "...unexpected character ","; expecting opening string quote(") for key value."

I verified my JSON file doesn't have a blank line in it. Any suggestions/help is greatly appreciated!

kdopen
  • 8,032
  • 7
  • 44
  • 52
Mike C
  • 1
  • 2
  • 2
    The response from Yelp doesn't look like a valid JSON object. There is no valid way in JSON to have the sequence `}{` outside of a quoted string. Your attempt with `sprintf` to turn it into an array won't work, because it doesn't insert commas between the objects. – kdopen Feb 18 '15 at 17:45
  • 1
    Stop using the `readLines`. Just pass the file path to `fromJSON`. – stanekam Feb 18 '15 at 19:46

1 Answers1

0

Your problem is malformed JSON as written in this answer: https://stackoverflow.com/a/34714966/5258043

Your input JSON is malformed, and has multiple elements at the root level. This is akin to defining an XML document with more than one root, which is of course not allowed.

The proper way to read your file is:

my_data <- rjson::fromJSON(file='./yelp.txt')

However, it fails because of the multiple root elements. You can either delete everything after the first element or wrap everything in one big root by adding to the top and bottom of the text file and separate each with a comma so that every entry in your JSON will be its own list element.

Note: you could use jsonlite package, I used rjson since its default parsing creates a nicer list, but you could really use either package. It's your preference.

Community
  • 1
  • 1
Steven M. Mortimer
  • 1,618
  • 14
  • 36