0

Hello Everyone!

I am trying to do do a web-crawler with node-horseman, that make easier do work with phantomJS. But I am stuck at one point.

Apparently, i can't run for loops inside .evaluate, is it right?

The gist with my code:

https://gist.github.com/matheus-rossi/bc4c688264be072ded4ff7ee3f933bc2.js

As you can see, if i run exactly the same code in the browser, everything works fine, like in this image:

Code running OK in the browser

But if i run the code in node-horseman, i get this:

Unhandled rejection eval@[native code]
evaluate

global code
evaluateJavaScript@[native code]
evaluate@phantomjs://platform/webpage.js:390:39
phantomjs://code/bridge.js:121:61    at Horseman.<anonymous> 
(/home/matheus/Documentos/NodeJs/node-horseman/node_modules/node-
horseman/lib/actions.js:839:38)
at Horseman.tryCatcher (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromiseCtx (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:606:10)
at Async._drainQueue (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:138:12)
at Async._drainQueues (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:781:20)
at tryOnImmediate (timers.js:743:5)
at processImmediate [as _immediateCallback] (timers.js:714:5)

This is my code in index.js, that runs node-horseman

var Horseman = require('node-horseman')
var horseman = new Horseman()

horseman
.open('http://www.angeloni.com.br/super/index')
.status()
.evaluate(function(){

const descNode = document.querySelectorAll('.descr a')
const desc = Array.prototype.map.call(descNode, function (t) { return t.textContent })

const valueNode = document.querySelectorAll('.price a')
const value = Array.prototype.map.call(valueNode, function (t) { return t.textContent })

const finalData = []

for (let i=0 ; i < desc.length; i ++) {
  let item = {}
  item['desc'] = desc[i]
  item['value'] = value[i]
  finalData.push(item)
}

return finalData

})
.then(function(finalData){
  console.log(finalData)
})
.close()

What am i missing ?

Edit - After including .catch in the promise, got this new information:

  message: 'Expected an identifier but found \'item\' instead',
Community
  • 1
  • 1

1 Answers1

1

The thing you are missing is that phantom.js is running javascript in a different environment than node. Like many browsers, not all of the nice es6 language features are available in this environment (yet).

If I run your code , I get errors from phantom.js with the use of let. Changing those to var makes your code work for me.

Also, it's a good idea to add .catch() after the promise, because then you'll get better errors, which may have been useful in this situation.

Mark
  • 90,562
  • 7
  • 108
  • 148