Question: What are good strategies for achieving 0 (or as close as possible to 0) downtime when using Django?
Most of the answer I read say "use south" or "use fabric", but those are very vague answer IMHO. I actually use both, and am still wondering how to achieve zero downtime as much as possible.
Some details:
I have a decently sized Django application that I host at EC2. I use South for schema and data migrations as well as fabric with boto for automating repetitive deployment/backup tasks that get triggered through a set of Jenkins (continuous integration server) tasks. The database I use is a standard PostgreSQL 9.0 instance.
I have a...
staging server that gets constantly edited by our team with all the new content and gets loaded with latest and greatest code and a...
live server that keeps changing with user accounts and user data - all recorded in PostgreSQL.
Current deployment strategy:
When deploying new code and content, two EC2 snapshots of both servers (live and staging) are created. The live is switched to an "Updating new content" page...
Downtime begins.
The live-clone server gets migrated to the same schema version as staging server (using south). A dump of only the tables and sequences that I want preserved from live gets created (particularly, the user accounts along with their data). Once this is done, the dump gets uploaded to the staging-clone server. The tables that were preserved from live are truncated and the data gets inserted. As the data in my live server grows, this time obviously keeps increasing.
Once the load is complete the elastic ips of the live server gets changed to the staging-clone (and thus it has been promoted to be the new live). The live instance and the live-clone instance get terminated.
Downtime ends.
Yes this works, but as data grows, my "virtual" zero downtime gets further and further away. Of course, something that has crossed my mind is to somehow leverage replication and to start looking into PostgreSQL replication and "eventually consistent" approaches. I know there is some magic I could do perhaps with load balancers, but the issue of accounts created in the meantime make it tricky.
What would you recommend I look at?
Update:
I have a typical Django single node application. I was hoping for a solution that would go more in depth with django specific issues. For example, the idea of using Django's support for multiple databases with custom routers alongside replication has crossed my mind. There are issues related to that which I hope answer would touch upon.