I am using cheerio
to scrape about 800 websites, to just get the site title. The first issue that I have is that sometimes I am getting an error message saying "We’ve encountered an error: Error: socket hang up". Secondly, maybe because of cheerio
's asynchronous nature, when I log the created objects they all have the address of the last web address in the array. Finally, I log the array that I have been pushing the objects into, but it is actually logging that immediately as []
, because it's completing this before it does anything else. How can I fix these three issues? I've been
var tempArr = [];
var completedLinks = ["http://www.example.com/page1", "http://www.example.com/page2", "http://www.example.com/page3"...];
for (var foundLink in completedLinks){
if(ValidURL(completedLinks[foundLink])){
request(completedLinks[foundLink], function (error, response, body) {
if (!error) {
var $ = cheerio.load(body);
var titles = $("title").text();
var tempObj = {};
tempObj.title = titles;
tempObj.address = completedLinks[foundLink]
tempArr.push(tempObj);
console.log(tempObj)
}else{
console.log("We’ve encountered an error: " + error);
}
});
}
}
console.log(tempArr);