11

My node.js script reads rows from a table in database 1, does some processing and writes the rows to database 2.

The script should exit after everything is done.

How can I know if everything has been done and exit node then?

If I have a callback function like this:

function exit_node() {
  process.exit();
}

(Edit: in the meantime it became obvious that process.exit() could be also replaced with db.close() - but this is not the question what to put in there. The question is at which time exactly to do this, i.e. how and where to execute this callback.)

But it is not easy to attach it somewhere. After the last read from db1 is not correct, because the processing and writing still has to happen.

Attaching it to the write to db2 is not easy, because it has to be attached after the last write, but each write is indepenent and does not know if it is the last write.

It could also theoretically happen, that the last write finished, but another write before that is still executing.


Edit: Sorry, I can see the explanation of the question is not complete and probably confusing, but some people still understood and there are good answers below. Please continue reading the comments and the answers and it should give you the whole picture.


Edit: I could think of some "blocking" controller mechanism. The different parts of the script add blockers to the controller for each open "job" and release them after the job is finished, and the controller exits the script when no more bockers are present. Maybe async could help: https://github.com/caolan/async

I also fear this would blow up the code and the logic unreasonable.

SHernandez
  • 1,060
  • 1
  • 14
  • 21
  • I don't understand your question. Node *does* exit when everything is done. Once there are no more asynchronous calls in progress -- once you're done processing the last record you read, and all your writes have completed -- then Node will exit on its own; you don't have to do anything special. – Joe White Jun 03 '12 at 18:15
  • 6
    @JoeWhite Many database modules will hold open a connection, keeping the process running. I think the question (correct me if I'm wrong, SHernandez) is how to know when Node is done with all it's database-related async IO, so that he can know when to shut down the DB connection allowing the process to exit. – Michelle Tilley Jun 03 '12 at 18:21
  • 2
    Node won't exit until there is one or more callbacks waiting or event emitters active. As @BrandonTilley mentioned, some of DB-modules keep DB-connection open (that's EventEmitter), so you should close it after you done with your queries. So in query's callback writes something like `db.close` (see docs), process results and node will exit by itself (without you calling `process.exit`). – Aleksei Zabrodskii Jun 03 '12 at 18:38
  • I did not understand previously, why node did not exit, but now I see that it is probably because of the database module, which keeps something open. @elmigranto what you write about closing the db connection hits the same problem that I decribe in the question: There is no simple way to know when all writing has been done. If I know when to call db.close I also know when to call process.exit ;-) – SHernandez Jun 03 '12 at 18:55
  • These comments were very helpful for me. I had a redis connection open, which was preventing the process from exiting. Closing the connection removed my need to exit(). – Jim Clouse Apr 18 '14 at 18:24

3 Answers3

13

JohnnyHK gives good advice; except Node.js already does option 2 for you.

When there is no more i/o, timers, intervals, etc. (no more work expected), the process will exit. If your program does not automatically exit after all its work is done, then you have a bug. Perhaps you forgot to close a DB connection, or to clearTimeout() or clearInterval(). But instead of calling process.exit() you might take this opportunity to identify your leak.

JasonSmith
  • 72,674
  • 22
  • 123
  • 149
  • 1
    JasonSmith, it is the other way around, I tried to explain in the question. Because I use callbacks none of them knows which is the last one. And as soon as I know, when everything is done, I can close the db connection or exit or whatever. But that was the reason for my question in the first place. – SHernandez Jun 04 '12 at 22:45
  • 3
    this is it :) , thanks, `If your program does not automatically exit after all its work is done, then you have a bug` – Rabea Mar 04 '16 at 02:09
  • Love y'all! Thought that having a async function in root of my batch script is simply doomed in node. Turned out I just din't close the RedisClient connection. All good now. Thx! – kub1x Apr 17 '18 at 15:00
  • sorry for the follow-up question, but do you have an easy way to find out which statement/component is preventing my program from exiting? – Jayson Cheng Jul 10 '18 at 04:07
8

The two main options are:

  1. Use an asynchronous processing coordination module like async (as you mentioned).
  2. Keep your own count of outstanding writes and then exit when the count count reaches 0.
JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • I like no 2. It should not be a problem increasing and decreasing the same global counter from different callback routines in parallel? Increasing and decreasing can occur at any time from different parts of the application, can I do this in node? – SHernandez Jun 03 '12 at 19:03
  • There's no issue with contention as while the processing is asynchronous, it's still single threaded. – JohnnyHK Jun 03 '12 at 19:10
  • @SHernandez nothing in node itself is parallel. There are no two lines of code in your whole project, that could be running at the very same time. So, there is no need in access serialization. – Aleksei Zabrodskii Jun 04 '12 at 01:34
  • One further note: #2 is made much easier if you wrap all your DB calls in some kind of abstraction layer. – Michelle Tilley Jun 04 '12 at 04:04
3

Though using counter could be tempting (simple, easily implementable idea), it will pollute your code with unrelated logic.

db1.query(selector, function(err, results) {
  db1.close();

  // let's suppose callback gets array of results;
  // once all writes are finished, `db2.close` will be called by async
  async.forEach(results, processRow, db2.close);
});

// async iterator
function processRow(row, cb) {
  // modify row somehow
  var newRow = foo(row);

  // insert it in another db;
  // once write is done, call `cb` to notify async
  db2.insert(newRow, cb);
}

Although, using process.exit here feels like C++ new without delete for me. Bad comparison maybe, but can't help it :)

Aleksei Zabrodskii
  • 2,220
  • 3
  • 19
  • 41
  • What does selector stand for? (First parameter of db1.query) ? – SHernandez Jun 04 '12 at 22:55
  • @SHernandez You **select** rows somehow, right? If it's Postgres, `selector` would be a string (`SELECT now();` or something). If it's other DB, selector would be something else. Same thing with `newRow` on insertion. Anyway, that was an idea in my answer, not real code. – Aleksei Zabrodskii Jun 05 '12 at 12:50