My scraper app is searching a Vimeo URL with a query string attached to it which is
'http://vimeo.com/search?q=angularjs'
When I load that URL on Chrome I can see a number of elements that do not show up with I request()
that URL from my scraper. The HTML that I can load with both Chrome and my scraper are what seems to be static elements like the HTML found in the nav bar and footer. When I try to access any elements that would be generated by Vimeo processing the query string search?q=angularjs
, my scraper does not get access to the video gallery grid that shows up in Chrome. So here is my scraper so far:
var request = require('request'),
cheerio = require('cheerio'),
searchURL = 'http://vimeo.com/search?q=angularjs';
request(searchURL, function(err, resp, body){
if(err)
throw err;
$ = cheerio.load(body);
console.log($('#site_header .join a').text());
console.log($('#page_header h1').text());
$('#browse_content .browse_videos li a').each(function(){
console.log(this.attr('href'));
});
});
After loading the body into $
with Cheerio I run
console.log($('#site_header .join a').text());
which logs Join
to the console. That works. Great. But if I do
console.log($('#page_header h1').text());
what I get logged to the console is Please Try Again
which I assume means that the query could not be fulfilled. And when I see that bit of HTML in the page sourcein Chrome I see:
<header id="page_header">
<h1>Search videos for <mark class="txt_normal">angularjs</mark></h1>
</header>
And just to be certain I ran
console.log($('html').html());
which spit me back an HTML page that is missing the browse_content
div which contains the video thumbnail gallery grid. This is why the following code returns nothing:
$('#browse_content .browse_videos li a').each(function(){
console.log(this.attr('href'));
});
So how come Vimeo does not want to give my scraper the content it is requesting?