0

We are currently using the latest daemontools (http://cr.yp.to/daemontools.html) to manage our background linux (AWS-linux) application servers. Our application servers run in JVMs:

[ec2-user@ip-10-0-1-220 local]$ java -version
java version "1.7.0_75"
OpenJDK Runtime Environment (amzn-2.5.4.0.53.amzn1-x86_64 u75-b13)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)

Everything works well and as expected unless we restart the server:

sudo shutdown -r now

When the server restarts the configured daemontools services start and run ok for ~10-20 minutes. After this period however threads within the application servers begin to hang until the entire process is frozen. The only way we have currently found to fix the problem is to recreate the service directory, under /service/...

The symptoms may appear to indiciate corrupted data in the /service/.../supervise/ directory. This issue does not appear to have been discussed before.

Any suggestions or advice on how we can restart our servers without this problem would be greatly appreciated.

MarcF
  • 213
  • 2
  • 11

1 Answers1

0

The first step to diagnose is to execute sudo ./run from the service dir and just make sure that will run continuously in the foreground. If it doesn't, then you'll need to address it in your application.

If it does run all right manually, then the problem could be in how you're setting up the service dir. Can you post your steps to recreate?

drewr
  • 146
  • 3
  • The applications run without issue when not executed via "svc -u". Steps required to reproduce are provided in the post. – MarcF Apr 02 '15 at 17:17
  • I don't see the directory creation steps. A properly created directory structure and `run` should have you never needing to debug anything in `supervise`. Can you provide your `run` program? – drewr Apr 03 '15 at 02:15
  • I've recently managed to recreate this problem. Using a fresh Amazon Linux AMI (HVM), install as per steps provided at http://cr.yp.to/daemontools/install.html (+compile bug fix). This creates the /command and /service directories. I then create my service directory as follows: – MarcF Apr 30 '15 at 10:41
  • "cd /service; sudo mkdir testservice; sudo chown ec2-user:root -R testservice;" I then place my run and executables in the new directory and start the service. Everything runs fine. Now restart the machine a few times and voila the service will always start but eventually all threads lock and stop. Running the ./run file external to SVC runs without issues, always. – MarcF Apr 30 '15 at 10:44
  • Could be various reasons. Feels like permissions to me. *"...place my run and executables in the new directory.."* <- that's a lot of important stuff you glossed over. The way you wrote `run`, how you're executing it interactively, and even what the application threads are doing, are crucial pieces to the puzzle. – drewr Apr 30 '15 at 16:40