How to design a resilient and highly available service in python?

Question

I am trying to design a resilient and highly available python API back-end service. The core service is designed to run continuously. The service has to run independently for each of my tenants. This is required as the core service is a blocking service and each tenant's execution needs to be independent from any other tenant's service.

The core service is to be started by a provisioning service. The provisioner is also a continuously running service and is to be responsible for doing the house-keeping functions i.e start the core service on tenant sign-up, check for the required environment and attributes and stop the core service etc.

Currently I am using the multiprocessing module to spawn child instances of the core service from the provisioner service. Having a multi-threaded service with one thread for each tenant is also an option but that has the drawback of disruption of service for other tenant if any of the threads craches. Ideally I would like all these to run as background processes. The problems are

If I daemonize the provisioner service, multiprocessing will not let that daemon to create child processes. This is written here
If the provisioner service dies, then all the children will become orphans. How do I get back from this situation.

Obviously, I am open to solutions that do not follow this multiprocessing usage model.

Gabriel Samfira · Answer 1 · 2013-07-29T09:49:41.470

2

I would recommend you take a different approach. Use the system tools available in your distribution to manage the life-cycle of your processes instead of spawning them yourself. The provisioner would be much simpler as well, as it will not have to reproduce what your operating system can do with little effort.

On Ubuntu/CentOS 6 systems you can use Upstart, which has a great deal of advantages compared to the old sysvinit (aggressive parallelisation, respawning, simple init config syntax, etc).

There is also SystemD which is similar to upstart in design, and comes default in OpenSuse.

The provisioner could then be used only to create the needed init config for each service, and start or stop them using the subprocess module. You could then monitor your instances in case upstart was not able to respawn an instance, and send an alert, or try to start the service again.

Using this approach, you isolate all instances of user services from one another. If the provisioner crashes, the rest of the services will remain up.

For example, say your provisioner is running in the background. It gets a message via AMQP or some other means to create a user and start services for that user. One possible flow youd be:

create user
Do any bootstrap needed for new users
Create /etc/init/[username]_service.conf
start [username]_service

The init script could look similar to:

description "start Service for [username]"

start on runlevel [2345]
stop on runlevel [!2345]

respawn

# Run before process
pre-start script
end script

exec /bin/su -c "/path/to/your/app" <username>

This way you offload process management from your provisioner to the system upstart daemon. You only need to do job management in a simple way (create/destroy services when a user is created or deleted).

edited Jul 29 '13 at 09:49

answered Jul 29 '13 at 08:58

Gabriel Samfira

2,675
1
17
19

"create the needed init config for each service". The service name would be same, just the instance would be different. I would have to put all such `.conf` files into /etc/init??...That is not a good idea.. "You could then monitor your instances". Are you saying that I run the upstart jobs from within my provisioning service and check the status? – auny Jul 29 '13 at 09:27
Also, how do I restart if one of the user service crashes? – auny Jul 29 '13 at 09:29
Well, if you need to create an instance for each user, you could name them _. The naming is up to you. But seeing as every user has a different name, I think it's fairly safe naming scheme. You would not run upstart jobs from within your provisioner. Your provisioner would create init scripts and start the services using subprocess.Popen (start _). You can monitor your app by issuing a "status service_name". Upstart should take care of respawning, but this check would be easy to implement and potentially useful. – Gabriel Samfira Jul 29 '13 at 09:41
Makes sense. The `/etc/init` directory will get flooded. There can be a large number of users and a large number of such different services. Can upstart be made to made to use some other directory in addition to the default one? – auny Jul 29 '13 at 09:46
I have updated my answer to be a bit more explicit. I am not sure if upstart has the option to use a different folder for init scripts, but it should be fine if you use the default one. Just use a common prefix for your configs and you should be able to filter easily. How many such services are you planning on starting? I use this setup on a few machines that start up to 50 user services (memcache instances that listen on sockets) without any problems. You will probably run into resource shortage before upstart complains about the nr of configs :). – Gabriel Samfira Jul 29 '13 at 09:53

eri · Answer 2 · 2013-07-29T08:46:57.163

0

On debian-like you can wrap not demonized service with

start-stop-daemon --start --quiet --background --make-pidfile --pidfile $PIDFILE --exec $DAEMON --chuid $USER --chdir $DIR -- \
    $DAEMON_ARGS

Children must die after proceesing task. Parent process must be so simle so posible, only "resieve task - spawn child" in main loop.

edited Jul 29 '13 at 08:46

answered Jul 29 '13 at 08:41

eri

3,133
1
23
35

That is not what I want. The description clearly states that the children cannot exit – auny Jul 29 '13 at 08:54

How to design a resilient and highly available service in python?

2 Answers2