2

While building a fairly complex scraper i stumbled upon a problem with a control flow of my code.

What's going on in code below: 1) request a URL 2) scrape NEWURL from the results 3) pass it to readability API as first async function 4) here comes the trouble — i never get the next async function which saves readabilityData to DB

How to solve this problem? I'm new to JS, so please feel free to point out at any issues with my code.

 request(URL, function(error, response, html) {
    if (!error) {
        var $ = cheerio.load(html);
            NEWURL = data.find('a').attr('href');

            readabilityData = {}                
            var articleUrl = 'https://readability.com/api/content/v1/parser?url=' + NEWURL + token;

            async.series([
                function(){
                    request(articleUrl, function(error, response, html) {
                        if (!error) {
                            readabilityData = response.toJSON();
                        }
                    });
                },
                function(readabilityData){
                    Article.findOne({ 
                        "link": url // here's the 
                    }, function(err, link){
                        if(link) {
                            console.log(link)
                        } else {
                                var newArticle = new Article({
                        // write stuff to DB
                                });
                                newArticle.save(function (err, data) {
                        // save it
                                });
                        }   
                    });
                }
            ],
            function(err){
               console.log('all good — data written')
            });


        });
    }
});

1 Answers1

1

You need to call the callback parameter that's passed into the functions of the async.series call when each function's work is complete. That's how async.series knows that it can proceed to the next function. And don't redefine readabilityData as a function parameter when you're trying to use it to share data across the functions.

So something like:

var readabilityData = {};

async.series([
    function(callback){
        request(articleUrl, function(error, response, html) {
            if (!error) {
                readabilityData = response.toJSON();
            }
            callback(error);
        });
    },
    function(callback){
        Article.findOne({ 
            "link": url // here's the 
        }, function(err, link){
            if(link) {
                console.log(link);
                callback();
            } else {
                    var newArticle = new Article({
            // write stuff to DB
                    });
                    newArticle.save(function (err, data) {
            // save it
                        callback(err);
                    });
            }   
        });
    }
],
function(err){
   console.log('all good — data written')
});
JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • That do helped, thank you! however, all stuff from readabilityData is 'undefined' when i save it to DB. I miss something obvious? – walkthroughthecode Jan 06 '15 at 14:03
  • 1
    @JohnnyHK i would suggest an edit, async.series functions actually takes two parameters one is callback and other is the result of previous function. That will actually make OP also aware on how to use results of previous functions. – Sikorski Jan 06 '15 at 14:05
  • 1
    @Sikorski You're thinking of `async.waterfall`, `async.series` doesn't support that. `readabilityData` should be declared as `var readabilityData = {};` so that it's not a global, but other than that it's fine. – JohnnyHK Jan 06 '15 at 14:31
  • Yes, solved it, thanks. The catch was here: i should parse the response.body instead of response. – walkthroughthecode Jan 06 '15 at 16:02
  • @JohnnyHK you are right, its waterfall. Then i would suggest OP to use waterfall since it fits better than series in his code. – Sikorski Jan 07 '15 at 09:02