2

As we are developing a complex set of services on Linux, we developed a tool that starts them one by one. One of the many considerations for creating such a tool was the order in which items can be started, but also a way to ensure that on the death of a daemon, the daemon auto-restarts. There are also server wide parameters that are shared between all the services.

However, I have a problem now where shutting down such a system takes time. It can take as much as 10 seconds to shutdown everything.

What I'm wondering is: How long a script defined under /etc/init.d/... can take to shutdown daemons it is controlling?

Although I would imagine that if we were to break down all of those daemons in separate packages (since startup scripts can now include a list of dependencies...), we would bumped in the exact same problem. So at this point we prefer to keep things the way they are...

Is there a well defined/known amount of time that a shutdown must take at the most to be graceful to all daemons?

Alexis Wilke
  • 2,210
  • 1
  • 20
  • 37
  • 1
    Minutes or hours even? 10 seconds hardly seems worth worrying about. – Michael Hampton May 14 '16 at 04:02
  • Well... it has been increasing and it could take much longer in some situations. But yeah, at this point it's not too bad... – Alexis Wilke May 14 '16 at 04:03
  • You may wish to switch to `systemd` [Note: I'm _not_ a fanboi for systemd, but ...]. The _original_ rationale in the white paper http://0pointer.de/blog/projects/systemd.html was that bash scripts use utilities and do plenty of [wasteful] fork/execs (e.g. `x=$(grep ...)`) and if the scripts could be eliminated it would save a lot of time. It actually does. Also, with systemd, it can build a dependency graph and do things in parallel on multiple cores. Ditto for shutdown. – Craig Estey May 14 '16 at 04:29
  • 2
    what is the problem you are trying to solve here? – aaaaa says reinstate Monica May 14 '16 at 05:37
  • I've seen far more people worried that their shutdown process wasn't waiting long enough for the daemons to exit gracefully than the non issue you seem to be worrying about. – Julie Pelletier May 14 '16 at 06:09
  • That's what I'm worried about as I add more and more things that are slowly, but surely increasing the time it takes to shutdown. From what I can see, at this time it does not seem to be a problem. – Alexis Wilke May 14 '16 at 06:37
  • @aaaaaa, a nice shutdown opposed to having my processes receive a KILL event, in part because some of those are working in the database and to avoid inconsistencies, a graceful shutdown is highly preferred. – Alexis Wilke May 16 '16 at 07:16
  • OK, then the question really is: what are the established ways (beyond standard built-in like `systemd`'s) to properly ensure graceful shutdown. In cases of service A it might take up to 1 minute, in case of service B up to an hour (imaging shutdown involving total backup on tape). But underlying problem is gracefulness, not time. Define your "graceful shutdown" and ask how to ensure it – aaaaa says reinstate Monica May 16 '16 at 07:36
  • @CraigEstey, as a side note, when we started systemd was just being implemented, so we skip on that one. But at some point we probably will change our initialization process to make us of it instead. – Alexis Wilke May 17 '16 at 00:20
  • A wise decision. systemd was _not_ ready for primetime back in the day. The concept was a good one, but the C code looks like newbie level, style wise. I have been using systemd [forced to, by using fedora ;-)] and I used to gnash my teeth. It's tolerable now. Now, my complaint is it usurping gdm and other user login stuff and wants to be everywhere in everything. It has also replaced shell in ramdisk boot. But, I have seen shutdown times go _down_ and the system shutdown no longer freezes if some service doesn't stop when requested. So, maybe worth a try. – Craig Estey May 17 '16 at 01:35
  • That said, a caveat: systemd comes from the same personages that brought us pulseaudio. The general M/O seems to be, publish whatever, with inadequate [or no] testing, ignore/deny bug reports (e.g. "You just don't understand it ..."). Linus [Torvalds] has gone on record about the lack of timely bug fixes, initial checkin of broken code, etc. and one or more of the developers got semi-banned. – Craig Estey May 17 '16 at 01:43
  • 1
    As a stopgap, you might be able to identify the init.d "hot spot" [bash] scripts [that do a lot of useless fork/exec]. Recode them in perl/python. (i.e.) Whatever bash needed to fork/exec for, can come from an intrinsic part of the language. I've actually done this with perl before when I was tasked with speeding up boot times. And, IMO, in addition to the fork/exec, the perl/python version will run faster still because they precompile to VM's instead of a line-by-line interpreter. As to the original question: ASAP. If you have a UPS saying shutdown, it may only have X seconds of power – Craig Estey May 17 '16 at 01:52

2 Answers2

5

Is there a well defined/known amount of time that a shutdown must take at the most?

No.

user9517
  • 115,471
  • 20
  • 215
  • 297
  • What about the [`TimeoutStopSec`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStopSec=) parameter? It has a default of 90 seconds... – Alexis Wilke Aug 06 '16 at 17:29
1

As I now tested a shutdown of various daemons on a system running with systemd, I can attest that the timeout is clearly defined for each daemon.

From what I can tell, it also applies to daemons that are still started/stopped with a SysV script. When Cassandra is still working on its files, doing a systemctl restart cassandra will not work as expected. For such services, you probably want to do a systemctl stop cassandra and once you can be sure it was stopped, do systemctl start cassandra.

So... You may define/change the TimeoutStopSec paramter on a per daemon basis. This allows you great granularity!

[Unit]
...
TimeoutStopSec=120

And you may change the system default: DefaultTimeoutStartSec (which is probably not advisable...)

There is another important timing, which is the restart feature (shown in the last link.) It is very important because systemd wants to restart a process in 100ms by default!!! So if your daemon take up to 2 minutes to shutdown, it won't work right...


For those interested, for Cassandra, I actually first run a script which stops Cassandra. Then I proceed with the shutdown.

This can take how much time Cassandra needs (it can be quite long) but it will cleanly stop Cassandra. Note that it may feel like it is long to shutdown that way, but on a restart, Cassandra will be ready nearly instantaneously.

In comparison, shutting down down really fast means killing Cassandra and on a restart it has to go back through its journals which is actually way longer. So that's a good trade off.

Alexis Wilke
  • 2,210
  • 1
  • 20
  • 37