0

So I am trying to write a script in javascript using horseman.js that will pull all html from each link stored in an array of urls

the basic idea is what follows and logs one url's html just fine

 var Horseman = require('node-horseman');

var result = "";
var pulledHtml = "";

var horseman = new Horseman();
 horseman
  .open(website)
  .html()
  .then(function(html){
    pulledHtml = html;
    result += pulledhtml;
    return result;
  })
  .log()
  .close();
  console.log(results);

The problem comes when I try to loop this

For example

var result = "";
var pulledHtml = "";
var website = ["www.example1.com","www.example2.com"]; //(etc)
var horseman = new Horseman();



    for (var i = 0; i < website.length; i++) {
     horseman
      .open(website)
      .html()
      .then(function(html){
        pulledHtml = html;
        result += pulledhtml;
        return result;
      })
      .log()
      .close();
    }
console.log(result);

Now I know it this loop is doomed to fail the problem is that I am too new to horseman.js to get a good grasp of how to fix this issues from the code above

Issues being: 1.) for-loop is sync versus horseman being async so the for loop calls horseman again before it is finished pulling the html from the current url

2.) Can't seem to nail down how best to pass on the html I find into a new variable as I am certain I am not doing it well (end goal is to have all html saved in one variable)

Methods I have tried so far to fix my first issue are

var chain = horseman

for(var i = 1; i < website.length; i++) {

 chain = horseman
          .open(website)
          .html()
          .then(function(html){
            pulledHtml = html;
            result += pulledhtml;
            return result;
          })
          .log()
          .close();
 }

But I don't have a good way to test this one as console.logging result after the loop doesn't return anything ( I will be doing things with the result variable later)

However, this appears to wait as the console.log doesn't just instantly return an empty result variable like in the doomed for loop example

Lastly I have tried

async.each(website, function(item, callback) {
    horseman
  .open(website)
  .html()
  .then(function(html){
    pulledHtml = html;
    result += pulledhtml;
    return result;
  })
  .log()
  .close();
    callback();
});
  console.log(result);

still to no avail,

Many thanks if anyone can help me out with this!

Tahum
  • 37
  • 5

1 Answers1

0

I don't know this Horseman API but because it uses .then() function I assume it's a Promise.

try to do this,

var result = "";
var website = ["www.example1.com","www.example2.com"]; //(etc)
var promises = [];

website.forEach(url => {
  promises.push(new Horseman()
  .open(url)
  .html())
}

Promise.all(promises)
.then(results => {
   return results.forEach(html => {
        result += html;
   })
})
.then(()=> {
   promises.forEach(horse => {
     horse.log().close() 
   })
   console.log(result);
})