0

I found lots of similar questions, but I still don't know what's wrong with my code. It seems that I cannot read global variable value (urls) in the callback function: I want to update the urls latestTimestamp value in the callback function(err, articles). Here is the code that went wrong:

var urls=[
    {"url": "http://www.economist.com/feeds/print-sections/77/business.xml", "latestTimestamp": new Number(0)},
    {"url": "http://news.sky.com/feeds/rss/home.xml", "latestTimestamp": new Number(0)},
    ]; // Example RSS Feeds; 

// parse RssFeeds from given websites and write them into databse
function parseRssFeeds(collection){
    var feed = require('feed-read');  // require the feed-read module

    // loop through our list of RSS feed urls
    for (var j = 0; j < urls.length; j++)
    {
        console.log('Original url timestamp is: '+ urls[j].latestTimestamp.toString());

        // fetch rss feed for the url: 
        feed(urls[j], function(err, articles)
        {
            // loop through the list of articles returned
            for (var i = 0; i < articles.length; i++)
            {           
                var message = 
                    {"title": articles[i].title,
                     "link": articles[i].link,
                     "content": articles[i].content,
                     "published": articles[i].published.getTime()};

                collection.insert(message, {safe:true}, function(err, docs) {
                    if (err) {
                        console.log('Insert error: '+err);
                    }else{
                        console.log('This item timestamp is: '+ message.published);
                        // get the latest timestamp
                        if (message.published >urls[j].latestTimestamp) {
                            console.log('update timestamp to be: '+ message.published);
                            urls[j].latestTimestamp = message.published;
                        }   
                    }
                });// end collection insert         
            } //  end inner for loop
        }) // end call to feed method

    } // end urls for loop
}

Thanks for any help. The error is:

TypeError: Cannot read property 'latestTimestamp' of undefined
    at /Users/Laura/Documents/IBM/project/TestList/app.js:244:37
    at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/collection/core.js:123:9
    at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1131:7
    at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1847:9
    at Server.Base._callHandler (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/base.js:445:41)
    at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:478:18
    at MongoReply.parseBody (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/responses/mongo_reply.js:68:5)
    at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:436:20)
    at emit (events.js:95:17)
    at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/connection_pool.js:201:13)
Pointy
  • 405,095
  • 59
  • 585
  • 614
guanh01
  • 72
  • 9
  • 2
    This is an instance of a very common problem that traps JavaScript programmers, and very often in this form - a loop using an index variable, and code in the loop that sets up a callback for an asynchronous function that itself uses the index variable. – Pointy Jul 16 '14 at 15:11
  • Can you just check this question http://stackoverflow.com/questions/13343340/calling-an-asynchronous-function-within-a-for-loop-in-javascript and delete your question altogether. – Vishwanath Jul 16 '14 at 15:21
  • It's important also to understand the basic superficial meaning of that error: when you see that, it means that something to the left of `.latestTimestamp` is not really a reference to an object, as the code expects - it's `undefined`. So step 1 in finding out the problem is to look for such references in your code. – Pointy Jul 16 '14 at 15:25
  • @vishwanath yes that one's as good as any. I don't think this one should be deleted, however; closed questions are still searchable, and this might help some future programmer reach the answer. – Pointy Jul 16 '14 at 15:27
  • Also see [JavaScript closure inside loops – simple practical example](http://stackoverflow.com/q/750486/218196). – Felix Kling Jul 16 '14 at 15:35

2 Answers2

1

This should probably be closed as a duplicate, but I'll put an answer here because the relationships between all the duplicate questions are often so hard to grasp for JavaScript programmers that haven't understood the fundamental problem.

There are two ways to solve this. One way is to change the way that you create the callback. Instead of using an inline anonymous function:

    feed(urls[j], function(err, articles)
    {
        // loop through the list of articles returned
        // ...

you'd create another function that returns the callback. You'd pass that function the URL, and that's what the returned function would use:

   function makeFeedResultHandler(url) {
     return function(err, articles) {
       // loop through the list of articles returned
       // ... code as before up to this line:
                    if (message.published > url.latestTimestamp) {
                        console.log('update timestamp to be: '+ message.published);
                        url.latestTimestamp = message.published;
                    }   
       // ... etc
     };
   }

Then you'd call "feed" like this:

   feed(urls[j], makeFeedResultHandler(urls[j]));

The key difference is that each function passed to "feed" will have its own private copy of the object (well, a copy of the reference to be picky) from the "urls" array, so it won't need to refer to the variable "j" at all. That's the crux of the problem: "j" is shared by all the callbacks in your code. By the time the callbacks are invoked, the value of "j" is equal to the length of the "urls" array, so urls[j] is undefined.

The other approach would be to use the .forEach method, available in newer JavaScript implementations. That approach would get rid of "j" altogether:

  urls.forEach(function(url) {
    console.log('Original url timestamp is: '+ url.latestTimestamp.toString());

    // fetch rss feed for the url: 
    feed(url, function(err, articles)
    {
        // loop through the list of articles returned
        // ... code as previously, substituting "url" for "urls[j]" everywhere
    });
 });

Again, that makes sure that every callback sent to the "feed" function has its own copy of the "urls" element.

Pointy
  • 405,095
  • 59
  • 585
  • 614
  • I might be wrong and it's a little off-topic, but if `feed` is [this modul](https://www.npmjs.org/package/feed-read), then first param should be string (url), which is not `urls[j]`, but `urls[j].url`? It has nothing to do with the problem itself, Im just curious. – Entity Black Jul 16 '14 at 15:32
  • @Windkiller yes probably; I'm surprised it got as far as it did in the original code. I suppose that "feed" function might know what to do. – Pointy Jul 16 '14 at 15:37
1

Expanding on what @Pointy said in his comment under your post:

The insert function you are using with MongoDB is async, but you are treating the callback like it is synchronous. What is essentially happening in your loop, is everything works as planned until you hit collection.insert. From there, the process breaks off and essentially says "I'm going to tell mongo to insert a record now.. and eventually I'll expect a response." Meanwhile, the loop continues on to the next index and doesn't synchronously wait until the callback fires.

By the time your callback fires, your loop is already done, and J doesn't represent the index anymore, which is why its coming up undefined. You also run the risk of getting a different index than what you plan also with this current method.

I would recommend reworking your loop to support the async nature of node. There is a great library called - oddly enough - async that makes this process super simple. The async.each() function should help you accomplish what you are trying to do.

James LeClair
  • 1,234
  • 10
  • 12
  • Oh yes I didn't even see the `.insert()` call; I assumed it was the callback to that "feed()" thing. Same basic issue however. – Pointy Jul 16 '14 at 15:24