2

I've created a init.d script called rmCluster which is supposed to execute a simple python script at shutdown which uses boto to shutdown a particular cluster of servers, with 755 perms, located in /etc/init.d/rmCluster written as:

#!/bin/sh
#
# chkconfig: 0 1 1
# description: My service
#
# Author: Me
#
#
### BEGIN INIT INFO
# Provides: rmCluster
# Required-Start:
# Required-Stop:
# Default-Start:  0
# Default-Stop:  0
# Short-Description: My service
# Description: My service
### END INIT INFO

case $1 in
start)
python /usr/local/sbin/instanceStopper.py &
touch /tmp/theScriptWorks
;;
esac
exit 0

I have also created a symlink at /etc/rc0.d/S00rmCluster which points to the above. Note that I am touching a file in /tmp which is successfully occurring.

The python script also has 755 permissions and is written as:

#!/usr/bin/env python

import boto.ec2
import subprocess

conn=boto.ec2.connect_to_region("us-west-2")
reservations = conn.get_all_instances()
cluster = []
inst_id = subprocess.Popen(["wget", "-q", "-O", "-", "http://169.254.169.254/latest/meta-data/instance-id"], stdout=subprocess.PIPE).communicate()[0]

for res in reservations:
    for inst in res.instances:
        if inst_id in inst.tags["Name"] and "cloudformation" not in inst.tags:
            cluster.append( "%s" %(inst.id) )

conn.terminate_instances(cluster)

Note that the python script works perfectly fine when called directly and it also works fine when run the init.d script directly. I've also attempted to remove the shebang in the python script and specifying the path to python within init.d call and it still doesn't work.

My initial though is that perhaps the python libraries are not longer available during this runtime so the script fails, but I'm not sure how to check that. Also, I've contemplated that perhaps it needs to be placed somewhere else in the rc.x dirs. Currently I have set to at S00 and it is the only S00. Killall I moved to S01 and halt I moved to S02; these are the only three "S" scripts within rc.0/

I do appreciated the help

Solution

The solution was a combination of input from the response of @Jayan and @Kjetil Joergensen.

The final working version of the init.d script is as follows:

#!/bin/bash
#
# chkconfig: 2345 99 1
# description: My service
#
# Author: me
#
#
### BEGIN INIT INFO
# Provides: rmCluster
# Required-Start:
# Required-Stop:
# Default-Start:  0
# Default-Stop:  0
# Short-Description: My service
# Description: My service
### END INIT INFO


case "$1" in
start)
touch /var/lock/subsys/rmCluster
;;
stop)
/usr/bin/python /usr/local/sbin/instanceStopper.py
;;
esac
exit 0

The major changes were:

  1. Moving the 'start)' portion into a 'stop)' portion
  2. Touching the lock file in the 'start)' portion
  3. Modifying the 'chkconfig:' parameter so that it 'starts' with normal services and get's killed with them as well, thus preventing the script from trying to execute post 'networking' shutdown as noticed by @Kjetil Joergensen

Note: The python script was not changed.

Two caveats, one is that it requires to run service start rmCluster in order for it to be shutdown during runlevel 0 and 6. For me this was acceptable since it is getting set up during cloudformation provisioning so it is trivial to add this step into EC2 User Data. The second is that the script executes during reboots as well which may not be ideal for every use case. I'll have to do further investigation to see how to make such that only runlevel 0 actually runs 'stop' on this script.

Thank you both for the help.

DefionsCode
  • 230
  • 3
  • 9
  • I wish I could give both you the green check mark. Ultimately I decided on what I think a proper answer should be; telling the OP how to figure out the answer. Thanks again. – DefionsCode Oct 10 '14 at 13:30

2 Answers2

1

(Almost) Everything you need to know is in /etc/rc.d/rc it's the shell-script that's used for changing runlevels, it's fairly readable in that it should be somewhat easy to suss out what it does.

The brief description of what it does is:

  • It first goes through every /etc/rc<runlevel>.d/K<num><subsystem> script, checks if it's started by looking for /var/lock/subsys/ and runs stop if it is
  • It then goes through every /etc/rc<runlevel>.d/S<num><subsystem> script, checks if it's stopped by checking for /var/lock/subsys/<subsystem> and runs start on it.

(There's probably some convenience function around dealing with /var/lock/subsys)

If everything before this holds true, what you'll want to do is probably:

  • Ensure there's a /var/lock/subsys/<yourscriptname> present
  • Runlevel 0 seems appropriate (unless you also want to include reboot, which is 6), and you'll want to run it as /etc/rc0.d/K<num < 90><yourscriptname>, networking is killed off at 90, so change your implementation off to stop rather than start. You could potentially also "start" your script as part of the relevant runlevels (3,5, 1 being single user no network and 2,4 being unused) by just leaving behind the appropriate stuff in /var/lock/subsys
  • You definitively want to get rid of the ampersand, as your initscript will return before it's done, depending on how fast it chews through the rest of the scripts it'll get to 90 and kill off networking, at some point later it'll get to killall and eventually halt. To avoid shutdown hanging indefinately, you'll want to do the appropriate error-handling / timeout-handling in your script rather than just fire it off and leave the rest up to chance.
Kjetil Joergensen
  • 5,994
  • 1
  • 27
  • 20
  • This is great. I appreciate the learn to fish approach. The solutions was a combination of your advice and @Jayan approach. I'll update the question soon. – DefionsCode Oct 10 '14 at 13:16
0

Why don't you try changing your init script to start with

chkconfig: 2345 99 1

And move your code from case "Start" to "stop" and have an empty "start" case And then do chkconfig --add after placing your script in /etc/init.d

Note: you may have to delete any softlinks that you may have already created.

Also, Please make sure you have proper "PATH" loaded when your init script is executed. Since your Python program is already an executable file, may be you can just call it like

/path/to/program &

instead of

python /path/to/program &

Also on the "start" section of init file add following line:

touch /var/lock/subsys/program

Which basically creates a lock file and when the machine is rebooting | stopping, system will check the state of each service before initiating a stop. If system finds that a service is not running (if lock file not present) system might not run "stop" procedure

Jayan
  • 101
  • 3