-1

As I am going through this youtube scraping tutorial https://www.youtube.com/watch?v=qbEN3boz7_M, I was introduced that instead of scraping from the "public" page loaded heavily with all other stuff, there is a way to find a "private" page to scrape the necessary information much more efficiently using inspect element/firebug.

google chrome > inspect element > network > XHR

The person in the youtube video uses stock price as an example and be able to locate a "private" page to scrape much quickly and less intensive to the server. Though when I tried to look at sites I wanted to scrape, for example, http://www.rottentomatoes.com/m/grigris/, going through the inspect element (chrome) > Network > XHR > checking the headers' request URL and preview, I didn't seem to find anything useful.

Am I missing something? How can I ensure if a raw or condensed information is hidden somewhere? Using the Rottentomatoes.com page as an example, how can I tell if there is 1) a "private page" that gives the title and year of the movie and 2) a summary page (in csv-like format) that "stores" all the movies' titles and year in one page?

KubiK888
  • 4,377
  • 14
  • 61
  • 115

1 Answers1

0

You can only find XHR requests, if the page is dynamically loading data. In your example, the only thing of note is this URL:

http://www.rottentomatoes.com/api/private/v1.0/users/current/ratings/771355871

Which contains some information about the movie in JSON.

{"media":{"type":"movie","id":771355871,"title":"Grigris","url":"http://www.rottentomatoes.com/m/grigris/","year":2014,"mpaa":"Unrated","runtime":"1 hr. 40 min.","synopsis":"Despite a bum leg, 25-year-old Grigris has hopes of becoming a professional dancer, making some extra cash putting his killer moves to good use on the...","thumbnail":"http://content6.flixster.com/movie/11/17/21/11172196_mob.jpg","cast":[{"name":"Souleymane Démé","id":"771446344"},{"name":"Anaïs Monory","id":"771446153"}]}}

Make sure you have the chrome developer tools open when you load the site. If not, the developer tools don't capture any requests. You can open them and refresh the page, then you should see them under the XHR filter.

scandinavian_
  • 2,496
  • 1
  • 17
  • 19