7

I want to try my hand and webscraping. I've noticed that Anglesharp is pretty good for the .Net environment. I'm trying to get a list of all the descriptions and ratings from a yelp site and I don't get any errors or any results. Here's a subset of what the html looks like (more detailed in "https://www.yelp.ca/biz/walmart-toronto-12"):

<div class="rating-very-large">
    <i class="star-img stars_2" title="2.0 star rating">
        <img alt="2.0 star rating" class="offscreen" height="303" src="//s3-media4.fl.yelpcdn.com/assets/srv0/yelp_styleguide/c2252a4cd43e/assets/img/stars/stars_map.png" width="84">
    </i>
        <meta itemprop="ratingValue" content="2.0">
</div>
<p itemprop="description" lang="en">This Walmart still terrifies me<br><br>Baby things can be found on the back right of the lower level. Godspeed.</p> 

<div class="rating-very-large">
    <i class="star-img stars_1" title="1.0 star rating">
        <img alt="1.0 star rating" class="offscreen" height="303" src="//s3-media4.fl.yelpcdn.com/assets/srv0/yelp_styleguide/c2252a4cd43e/assets/img/stars/stars_map.png" width="84">
    </i>
        <meta itemprop="ratingValue" content="1.0">
</div>
<p itemprop="description" lang="en">Wow I don&#39;t even know where to begin, </p> 

Here's my query:

var config = var config = new Configuration().WithJavaScript().WithCss();
var parser = new HtmlParser(config);
var document = await BrowsingContext.New(config).OpenAsync("https://www.yelp.ca/biz/walmart-toronto-12");

//Do something with LINQ
var descriptionListItemsLinq = document.All.Where(m => m.LocalName == "p" && m.Id.Contains("description"));
foreach (var element in descriptionListItemsLinq)
{
    element.Text().Dump();
}

How do I get a list of the user reviews (descriptions) and ratings?

inquisitive_one
  • 1,465
  • 7
  • 32
  • 56

1 Answers1

0

I checked HTML source of https://www.yelp.ca/biz/walmart-toronto-12. As I expected the user reviews are in JSON format. You should not use AngleSharp in this scenario.

The photo below is extracted from the HTML source.

enter image description here

and here is a parsed version of the JSON:

enter image description here

It's a JSONand you can deserialize it with Newtonsoft.Json. Just extract the JSON and read what you need from it.

Ali Bahrami
  • 5,935
  • 3
  • 34
  • 53