Looking for advice on restructuring complex web app to make for easier upgrades

Question

I'm working with a complex system that has a couple of decades of history behind it. It used to be a client/server dispatching app, but it's gotten more complicated. Originally, each customer had his own instance, running on his own servers. Now, we have some customers who are still running in this mode, but we have some who are running in a Software as a Service mode - where all the applications are running on our servers. And we've added web interfaces, and we have hundreds of customers who access their systems solely through the web.

As it currently exists, each installation of the system consists of:

a database: in which nearly every record has a primary key beginning with "customerid", so multiple customers can run against the same database.
an installation directory: a directory on the SAN, in subdirectories of which exist the executables, log files, configuration files, disk-based queues, and pretty much everything else involved in the system that isn't a website
background apps: a bunch of applications, located in a subdirectory of the installation directory, but which may be running on one or more application servers, which are responsible for communicating with various off-site systems, mobile users, etc. They can be configured to run as Windows Services, or run from the command-line.
client apps: another bunch of applications, located in the same subdirectory, but running on any number of user machines, which which managers and dispatchers can interface with the system, dispatching work to the various mobile users, run reports on the work done, etc.
web apps: a couple of web sites/applications/services, that allow dispatch users to perform certain dispatching functions, and to allow mobile users to complete their assigned work from any web browser. Generally, there's a many-to-one relationship between web sites and system installations. We'll have a number of sites on a number of server platforms configured to run against any specific installation of the system, and use a load-balancer to distribute incoming users across them.

We have upwards of a dozen different installations, each running from a one to several hundred customers. (And from a handful to several hundred users per customer.)

The older background apps are written in unmanaged C++, the newer in C#. The client apps are written in VB6, running against an COM object written in unmanaged C++. The websites and services are ASP.NET and ASP.NET/MVC written in C#.

Clearly, it's gotten quite complicated, over the years, with a lot of parts and a lot of inter-relationships. That it still works, and works well, surprises me. And makes me think we didn't do too bad, when we first architected the beginnings, 20 years ago. But...

At this point, our biggest problem is the effort needed to install updates and upgrades. Much of the system is decoupled, so we can change one communications program, or fix a web page, etc., without much difficulty. But any change to the database schema pretty much mandates a system-wide change. And that takes significant time, and affects many customers, and involves significant risk. So the implementation of fixes get delayed, and that makes the risk when we do do an upgrade even higher, which results in more delay, and it's generally hurting our responsiveness.

So, I'm looking for advice as to architectural changes we might make, that would make upgrades less risky and less expensive.

In my ideal world, we'd never upgrade a running installation, we'd install an upgrade in parallel, test it, and once we were confident that it was working, we'd move customers from the old system to the new one, at first one at a time, and then later in bulk as we grew confident. And where we could roll a customer back to the old system, if things didn't work. But I see some problems with that:

We don't know what customer a user belongs to until after he's logged in.
Moving a user from one system to another involves copying hundreds of thousands of database records, and applying schema changes in the process.
Moving a custom from one system to another involves copying who knows how many files in our disk-based queues, and other assorted supporting files.
Rolling back is, I think, necessary. But it'd going to be even more difficult.

What we have is working, but it's not working well. And I was hoping for some advice.

I'm not looking for answers, exactly, but more I'm looking for ideas on where to look. Anyone have any ideas on where I could find information on how to deal with structuring systems of this scale?

score 0 · Answer 1 · answered Apr 27 '13 at 23:35

You have mentioned that the subsystems are isolated, so i guess it is not very problematic to replace/upgrade the components. The major pain point seems to be the db changes. It will be so because you are using a shared db model in a SAAS environment. The recommended DB models in the order of highly flexible to least flexible for a SAAS application are:

Separate databases for each customer, Shared database but different schema, Shared database Shared schema.

And the issue is due to the third model. It is least expensive but highly rigid.

To resolve this pain, each customer can be served by a different db or schema instance. This can be done for few customers at a time. Example, for customer A, create a different db/schema based on the choice. Export all the records for that customer id and populate into the new schema. Start routing the new customer to the new DB and you still have the old DB to ball back to.

In general, for web services it is always better to

use a router to invoke a real webservice.

I think, it might already be in place. So a customer actually calls the router service which will route to appropriate web service for the customer. It will be as simple as changing the configuration at the router to fall back or route to different versions or installations of the services. The above strategy can in some way be applied to different sub systems.

score 0 · Answer 2 · answered May 04 '13 at 01:57

Jeff, I noticed you are in Minneapolis, so am I. I will guess the initials DR or BI based on what you describe but its a swag only. With experience in having managed 75+ applications portfolio for a Fortune 50 company I know how crazy things get over 10/20 years as requirements continue to evolve and technologies leapfrog the original understanding of what was considered possible.

For your multi-tenant SaaS architecture, I am not sure if you will get any silver bullet answers here, but based on my experience in a startup where we are building a multi-tenant saas web applications platform (horizontal platform that takes care of 60% of the needs, leaving 40% to be solved by vertical apps) here are my words of wisdom:

I know it will be difficult but try and achieve an absolute separation between your application capabilities data and tenant data into two different schema/databases or whatever. This will allow you to significantly reduce roll-out times and testing requirements when changes are related to core application capabilities and has no impact on the customer or user data. Basically, you will have to put your team through excruciating pain to identify and peel off all tables that strictly define application capabilities (application metadata). Once this is achieved, then you will have to refine all your development and testing processes to separate tasks that affect core application platform capabilities vs. vertical application capabilities vs. customer data-related. That's what we do and for us rolling out upgrades to the core application platform is a matter of hours if not less.

Separate...

core platform capabilities data/metadata
vertical application capabilities metadata and transactional data
and finally, tenant + user metadata and transactional data

Honestly, without a thorough analysis of existing landscape, I can't tell you how or if this can be achieved. Good luck!

Looking for advice on restructuring complex web app to make for easier upgrades

2 Answers2