2

I am looking to use a scraper to return Yelp reviews for motels within a town. What I need to be able to do is search the reviews for key words, such as "mold", and be provided the motel back, along with the review itself. I have some code (I am using Jupyterhub), but it seems to only be giving me back the motel names.

import json
from bs4 import BeautifulSoup

api_key = '#insert key here'
headers = {'Authorization': 'Bearer %s' % api_key}
url = 'https://api.yelp.com/v3/businesses/search'
params = {'term':'motel','location':'Williamsburg, VA'}
req = requests.get(url, params=params, headers=headers)
parsed = json.loads(req.text)
businesses = parsed["businesses"]
business_url_list = [business["url"] for business in businesses]

print(businesses)

biznames=[]
for val in businesses:
    biznames.append(val['name'])

print(biznames)

review_list_master = []
for i in business_url_list:
    continue_search = True
    reviews_list = []
    while continue_search == True:
        html_doc = requests.get(i).content
        parsed_html = BeautifulSoup(html_doc, 'lxml')
        target_rows_url = parsed_html.findAll('div',attrs={'class','review-content'})
        for x in target_rows_url:
            new_text = x.text.strip().replace('\n','')
            date_break_point = new_text.find('    ')
            reviews_list.append(new_text[date_break_point+4:len(new_text)])
        try:
            target_rows_new = parsed_html.find('a',attrs={'class','u-decoration-none next pagination-links_anchor'})
            new_url = target_rows_new.get('href')
            i = new_url
        except:
            continue_search = False
    review_list_master.append(reviews_list)

print(target_rows_url)
print(parsed_html)

for i in range(len(review_list_master)):
    print(biznames[i])
    #print(len(review_list_master[i]))
    for x in review_list_master[i]:
        print(x)
        print()
    print("------")

Any suggestions would be greatly appreciated. I'm very much so a novice at coding, and I've tried using so many different scrapers that I cannot seem to make work.

Lindsay B
  • 21
  • 3

2 Answers2

1

You seem to be using the wrong endpoint. If you want to analyze reviews, you would need to use https://www.yelp.com/developers/documentation/v3/business_reviews. The one you are currently using, 'https://api.yelp.com/v3/businesses/search', is just letting you search for businesses, and review content is not one of the fields that you can search by (although this is what you want).

Unfortunately, it is not trivial to work around this, because this means you will need to make a very large amount of API calls (one per business) to get the reviews, and then store them in your own storage. Once you have them stored locally/in memory/where-ever, you could search through them by your own keywords,

Also, it unfortunately seems that the the business_reviews endpoint only provides up to three reviews per business. I'm not sure why that limitation is in place (or if it could be lifted if you pay for the API).

vasia
  • 1,093
  • 7
  • 18
0

The businesses search endpoint your app can use to harvest URLs of matching motels; you can squeeze that API to build a database of every motel on Yelp with the name and URL; once you’ve built your database the fusion API is limited to only 3 of the latest reviews (as previously mentioned) you’d need to hit the URL with a custom scraper (perhaps beautiful soup, or see other resources on Google such as https://www.reviewsmaker.com/api/demo/yelp/ to scrape reviews);

In a nutshell, you can’t do it with business search and your greatly limited by the regular reviews API. Use Business search to harvest a catalog and double up with a scraper that’ll parse the review body and check if the index of the word mold exists and store your results; that would be my approach;

Ilan P
  • 1,602
  • 1
  • 8
  • 13