Unable to scrape the data using bs4

Question

I am trying to scrape the star rating for the "value" data from the Trip Advisor hotels but I am not able to get the data using class name: Below is the code which I have tried to use:

review_pages=requests.get("https://www.tripadvisor.com/Hotel_Review-g60745-d94367-Reviews-Harborside_Inn-Boston_Massachusetts.html")  
soup3=BeautifulSoup(review_pages.text,'html.parser')   
value=soup3.find_all(class_='hotels-review-list-parts-AdditionalRatings__bubbleRating--2WcwT')    
Value_1=soup3.find_all(class_="hotels-review-list-parts-AdditionalRatings__ratings--3MtoD")

When I am trying to capture the values it is returning an empty list. Any direction would be really helpful. I have tried mutiple class names which are in that page but I am getting various fields such as Data,reviews ect but I am not able to get the bubble ratings for only service.

QHarr · Answer 1 · 2019-03-07T05:24:37.407

You can use an attribute = value selector and pass the class in with its value as a substring with ^ starts with operator to allow for different star values which form part of the attribute value.

Or, more simply use the span type selector to select for the child spans.

.hotels-hotel-review-about-with-photos-Reviews__subratings--3DGjN span

In this line:

values=soup3.select('.hotels-hotel-review-about-with-photos-Reviews__subratings--3DGjN [class^="ui_bubble_rating bubble_"]')

The first part of the selector, when reading from left to right, is selecting for the parent class of those ratings. The following space is a descendant combinator combining the following attribute = value selector which gathers a list of the qualifying children. As mentioned, you can replace that with just using span.

Code:

import requests
from bs4 import BeautifulSoup
import re

review_pages=requests.get("https://www.tripadvisor.com/Hotel_Review-g60745-d94367-Reviews-Harborside_Inn-Boston_Massachusetts.html")  
soup3=BeautifulSoup(review_pages.content,'lxml')   
values=soup3.select('.hotels-hotel-review-about-with-photos-Reviews__subratings--3DGjN [class^="ui_bubble_rating bubble_"]')    #.hotels-hotel-review-about-with-photos-Reviews__subratings--3DGjN span
Value_1 = values[-1]
print(Value_1['class'][1])
stars = re.search(r'\d', Value_1['class'][1]).group(0)
print(stars)

Although I use re, I think it is overkill and you could simply use replace.

Unable to scrape the data using bs4

1 Answers1