2

I am writing a utility in node.js that has to process and concatenate a large number of files every night. In synchronous pseudocode it would look like that (omitting try / catch for clarity):

while (true) {
    var next = db.popNext();
    if (!next) return;

    out.append(next);
}

However, in the library I am using popNext() is actually a node-style asynchronous method and rather looks like this: popNext(callback).

Since I am writing the middleware from scratch I could use --harmony (e.g., generators), async or bluebird.

Ideally I would prefer something like:

forEachOrdered(db.popNext, (error, next, ok, fail) => {
   if(error) return; // skip

   // If there was an internal error, terminate the whole loop.
   if(out.append(next)) ok();
   else fail();
}).then(() => {
   // All went fine.
}).catch(e => {
   // Fail was called.
});

However, I am open to other 'standard' solutions. I was wondering what would be the most concise solution to this problem?

Edit Just spawning all (in a regular for loop) at the same time would probably not solve my problem since we're talking about 100k's and for every item I have to open and read a file, so I would probably run out of file descriptors.

Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
left4bread
  • 1,514
  • 2
  • 15
  • 25

1 Answers1

4

Here is a solution using bluebird coroutines using your "ideal" code:

var db = Promise.promisifyAll(db);

var processAll = Promise.coroutine(function*(){
  while(true){
    var next = yield db.popNextAsync(); // promisify gives Async suffix
    if(!next) return;
    out.append(next); // some processing
  }       
});

In ES2016 (ES7) this becomes:

var db = Promise.promisifyAll(db); // still need to promisify

async function processAll(){
  let next;
  while(next = await db.popNextAsync()){
     // whatever
     out.append(next);
  }
}

Although, I'd argue the output collection should be an iterable (and lazy) too, so using ES2016 async iterators:

var db = Promise.promisifyAll(db);
async function* process(){
    while(true){
       var val = await db.popNextAsync();
       if(!val) return;
       // process val;
       yield process(val); // yield it forward
    }
}

Although if we really want to go all out here, after converting db.popNext into an async iterator this becomes in ES2016 async for notation:

async function* processAll(){
    for async(let next of db.asAsyncIterator()){ // need to write this like above
       yield process(next); // do some processing
    }
}

Leveraging the whole ES2016 async iteration API. If you can't, or don't want to use generators you can always convert while loops to recursion:

function processAll(){ // works on netscape 7
   return db.popNextAsync().then(function next(value){
      if(!value) return;
      out.push(process(value));
      return db.popNextAsync().then(next); // after bluebird promisify
   });
}
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • If you say *"ideal"*, is there a "better" (i.e., more practical) way to do it? – left4bread Jul 12 '15 at 12:31
  • 1
    @left4bread the first version of the code works without any transpilers or anything quirky on node today. All the other versions require traspilation (babel), the last one - no one wrote a transpiler for yet but it's standardized. One before the last was added to regenerator about a month ago. – Benjamin Gruenbaum Jul 12 '15 at 12:35
  • Hm, I tried solution 1) and it seems to work in principle. However, I noticed that when `yield db.popNextAsync()` has an error, the whole loop will terminate, i.e., `processAll().catch()` is being called. Is there a way to gracefully intercept `db.popNextAsync()` promises like in my updated question above? – left4bread Jul 12 '15 at 14:21
  • 1
    @left4bread surrounding them with a `try/catch` would be a good start :) – Benjamin Gruenbaum Jul 12 '15 at 14:22
  • Works, thanks! Very last question: How did you know about the try/catch? Is there good reading on the topic you can recommend? I was reading the API docs of bluebird and it MDN generators / yield but neither said anything about that. – left4bread Jul 12 '15 at 14:29
  • 1
    @left4bread well, there are two questions here. __I__ know about it because I implemented coroutines myself a few times, am an active bluebird contributor, wrote code that does it [for a book](https://github.com/getify/You-Dont-Know-JS) and have experience with similar techniques from other languages. As for the actual question - promises and generators are all about reclaiming sane sync control flow, you can try/catch, you can throw/return and do things like in sync JS. BTW pull requests would be very welcome for the new 3.0 docs http://bluebirdjs.com/docs/getting-started.html – Benjamin Gruenbaum Jul 12 '15 at 14:34
  • Omg ... now a switch flipped. The purpose of `processAll = Promise.coroutine` is technically not to *process all* as in being a loop construct (it just randomly happened to be one), but instead to allow for *synchronous* chaining using the yield keyword (`yield` emulating `await`). Therefore, one could argue, using constructs like `try` / `catch` would now actually be *the natural way*. – left4bread Jul 12 '15 at 14:49