0

I am essentially trying to scrape a page on the fly. When you hit this url, it ouputs the result from the scrape job. Everything works wonderfully the first time. The second time I try it (with different parameters passed through job.options.args) it won't even execute the node.io job's run() function. scrape_result returns empty the second time (I expect an object).

Any thoughts? How can I ensure the new results get returned the 2nd time? For my scrape job I'm almost exactly using example #3 from here: https://github.com/chriso/node.io/wiki/Scraping

excerpt from scraper.js (the rest is like example #3: https://github.com/chriso/node.io/wiki/Scraping)

run: function() {
    var book = this.options.args[0].book;
    var chapter = this.options.args[0].chapter;

    this.getHtml('http://www.url.com' + book + '/' + chapter + '?lang=eng', function(err, $) {

Then my app.js

var scrip_scraper = require('./scraper.js');

app.get('/verses/:book/:chapter', function (req, res) {
    var params = {
        book: req.param('book'),
        chapter: req.param('chapter')
    }

    scrip_scraper.job.options.args[0] = params;
    //scrip_scraper.job.options.args.push(chapter);
    console.log(scrip_scraper.job.options.args);



    nodeio.start(scrip_scraper, function (err, scrape_result) {

        console.log(scrape_result);
    }, true);

}); //app.get('/verses/:book/:chapter')
floatingLomas
  • 8,553
  • 2
  • 21
  • 27
Jamis Charles
  • 5,827
  • 8
  • 32
  • 42
  • I think that in order to help you out, we need to see more of your code. How did you create `scrip_scraper`? I don't think `scrip_scraper.job.options.args[0] = params;` is doing what you want to do. – Max Oct 29 '12 at 20:38
  • @Max I've added some more code above. I couldn't figure out the proper way of passing arguments to my scrape job. Using options.args[0] was the best I could come up with. It works the first time around beautifully. The second time, run() doesn't even seem to execute. – Jamis Charles Oct 29 '12 at 21:14

1 Answers1

2

You're probably running into scoping issues because options.args might change while a request is being made. Try passing the input to the job as a function argument so it cannot be changed by another request. Here's an example that you could adapt to your needs

app.js

var express = require('express')
  , scraper = require('./scraper')
  , app = express();

app.get('/:keyword', function (request, response, next) {
    scraper(request.param('keyword'), function (err, result) {
        if (err) {
            return next(err);
        }
        response.send(result);
    });
});

app.listen(3000);

scraper.js

var nodeio = require('node.io');

module.exports = function (keyword, callback) {
    var job = new nodeio.Job({
        input: [ keyword ]
      , run: function (keyword) {
            //Make the request here..
            this.emit(keyword);
        }
    });
    nodeio.start(job, { silent: true }, callback, true);
};
chriso
  • 2,552
  • 1
  • 20
  • 16
  • That worked beautifully. Not really sure why... :) I tried implementing parts of your solution, then at first, it would kill my node server after every job completion. But it works great now. – Jamis Charles Oct 30 '12 at 17:29
  • Question: I was using this signature before to call the fn `nodeio.start(new nodeio.Job({timeout:10, silent: true}, methods, callback, true))`. The second param is the methods, whereas in your example the second param is the config obj. Has that API changed? – Jamis Charles Oct 30 '12 at 17:31