-2

So I'm getting all the values I need from yelp API but it doesn't give you details about the business website, in order to get the URL I'm trying to scrape each item but I don't find the correct way to do so.

I saw this answer but doesn't print any link.

Any idea of how to get this info?

my code:

from bs4 import BeautifulSoup
import requests

resp = requests.get("https://www.yelp.com/biz/casa-d-paco-newark")
soup = BeautifulSoup(resp.content, 'lxml')

for link in soup.findAll('a', href=True):
    print(link['href'])

Sample URL: https://www.yelp.com/biz/casa-d-paco-newark

1 Answers1

0

So to anyone looking for a solution for this without making use of selenium:

import json
import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.yelp.com/biz/casa-d-paco-newark')
soup = BeautifulSoup(response.content, 'lxml')

# avoid common js errors
false = False
true = True
null = None

# url set to None by default
url = None

# not be the best solution but working so far
try:
    # get the commented out script that injects the info to the html
    content = str(soup.body.find('div', {'id': 'wrap'}).find('div', {'class': 'main-content-wrap--full'}).script)

    #indices where the dict starts and ends (removes the comment lines)
    start = content.index('{"gaConfig": ')
    end = -12
    
    # converts the string into dict
    content = json.loads(content[start:end])
    business_url = content['bizDetailsPageProps']['bizContactInfoProps']['businessWebsite']['linkText']
    print(business_url)

# Catch any exception
except Exception:
    print('no business website found')
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61