8

What is the organic growth process from a standalone solution into a Software as a Service? Clearly

Scalability is not a "feature" tacked on at the end [of] development.^

so I'm interested in high level code and architecture changes required.

  • Does one pick an existing platform and overnormalize it?

  • Does one start over with bare bones cloud architecture, then migrate legacy functionality?

  • Do aggressive technology upgrades (i.e. web forms > MVC) fit into the process?

Update:

I've been asked for some clarification on the current project architecture. Without going into too much detail, think of a .NET webforms application that plugs into a layer of business logic and integrates with multiple third-party vendors. Whenever new platform instances are required (I lack the terminology here, what I mean is when a new client requires business logic adjustments, integration with different third-party providers, hot new branding etc.), existing code is branched and a new environment is set up. Any changes are effectively very low-level, whether or not they happen directly in aspx files, component code or db config.

This scenario seems perfectly suitable to have a "proper" SaaS model implemented, but I'm having difficulty constructively contributing to the migration process. To rephrase the original questions asked, which would be an efficient strategy to follow:

  • Overnormalize an existing platform and make everything configurable, effectively suspending this simulated scalability and not bringing on new clients until the architecture is refactored. The downside to this imho is continuing to rely on code and structure not built for scalability (details below).

  • Start from scratch with whatever is deemed to be (subjectievly) the best architecture for the solution going forward, then migrate legacy functionality as needed. This allows for almost any desired technology upgrade but lacks visibility until completed and, being an aggressive change, will be seen as inherently high risk by the management.

Personally I'm leaning towards the second option because of the amount of legacy code present and lack of sufficient db normalization. At the same time the existing solution is mature and functional (if it isn't broken, don't fix it) and there are likely many more ways to scale other than the two approaches I've listed above.

If the context above allows for scenario-specific advice, I'll take it. However I'm stil open to more general do-s and dont-s and pointers suitable for a wider audience.

Oleg
  • 24,465
  • 8
  • 61
  • 91
  • What's your current design (the standalone solution you mentioned)? Can you update the question with some details? What kind of components (or applications) does it contain? – Marcel N. Jul 01 '12 at 06:11
  • @thecoon: I was hoping to get general advice that's not solution-dependant; however I'd be happy to add a bit of an overview tomorrow if it will help with your answers – Oleg Jul 01 '12 at 08:11
  • Well, at least your last point implies that you have an existing web forms app which you want migrated. So a bit of more info would help. – Marcel N. Jul 01 '12 at 08:13
  • @o.v. were you able to take a look at the last version of my answer? Does it answer your question? I will extend it and write as an article on some place online, soon, and will share the link here. Please let me know via comments if you think I can improve it by ..? :) – hasanyasin Jul 08 '12 at 11:31

5 Answers5

4

Very Short Answer

The key is architecture. The way is to divide and conquer. The approach is to relax.

A Little Longer Answer

Architecture

The most important component of a building is the architecture of it. The way space is shaped with walls and floors, windows and ceilings, i.e, elements of the construction itself. Architect's aim is not to design the walls. He designs them as a secondary part of the real job: designing the space that is shaped by the walls. We do not build buildings to have walls, we build them to have the space inside.

We first design functionality which is what we want from software. Then we go into details of making it possible, i.e building the product. Technologies we use are not the main thing, they are the walls and what we really want is the space itself.

With a good understanding of what we want from the software we are building, engineering becomes a much easier and almost automatic process. Technical difficulties start appearing with well understood definitions and fortunately, obvious solutions.

Divide and Conquer

One very important measure of a good architecture is that it makes components of the solution clearly defined. When we can see these components of the big picture separately, we can divide the work into separate parts. We can then build separate things to work together.

Relax

Maybe this title sounds like I am trying to bring some humor to text; but no, please don't get me wrong. If you have a good architecture, you can relax, so can your web servers, database servers, engineers, customers, users, and the rest of the world. If this is not the situation, it means you should go back to work on your architecture.

Huh?

Hey, what the heck I have been talking about here? It has been some paragraphs and three titles and I did not use a single software term except software. Such abstract chattery is for guys who have nothing to do all day but lazily walk around and talk and talk and talk... We are software people and we don't have any time for this. Hey I, Hasan, cut it short..!

Okay, I will try; but first of all, let's relax... then we can take a look at some real examples.

Examples

Let's say we are developing a web publishing service for professionals and also individuals. Every client will have a website of their own, running on our system. These websites can be ordinary personal websites with very low numbers of visitors or if our business get lucky, some of our other clients can be big publications like NY Times.

We need to solve two kinds of scalability problem: Scaling our business, our system as we start having more and more clients, running more and more websites. This one is a pretty easy problem compared to the second one, which is scaling a single website as it gets more and more visitors, more and more data, more and more applications to run on this data.

We can rewrite the question of "how to scale" as "how to divide" to see the solution more clearly. If we can divide something into small pieces, we can scale it by adding more resources to do those pieces, growing horizontally.

We will have data and applications that will work on that data. Let's say we have one database server and one web server and try to make this scalable.

Thinking of the web servers we will run for our service; if we do not keep data on those machines, they will turn into generic, equal components, little clients of data back-end that will interface this data to the rest of the world. By keeping our web servers light, silly, empty, we can easily make many of them to handle increasing number of requests.

Okay, turning web servers into just silly proxies is not the smartest idea. We need things to be done, applications to be run. And, because in our architecture, web servers are the easiest to multiply, we will want to do as many things as possible on these web servers. We will continue on this difficult problem under "Dividing Smarter" title below. Before that, let's look at what kind of architecture we currently have on table and what way it is scalable.

We use load balancers to make many web servers run in parallel and also divide many web sites into groups using DNS (even before requests hit our system) to multiple load balancers. For example, a.com, b.com, c.com to load balancer 1, a-very-big-website.com to load balancer 2, ... Each group of a load balancer, a set of web servers and a database server makes a separate universe in our system. Now we can have millions of websites and grow our system by adding more of these separate universes without any limits. We can serve as many clients with as many web sites as our marketing department can bring. Our first problem is solved, already. What about running big big big websites?

Dividing Smarter

Of course we cannot divide a single website into separate universes as we do with separate websites; but this does not mean we cannot divide anything at all. We will continue dividing and conquering. To do this, we need a closer look at the problems we solve.

What is a website? Web pages, supporting content like css and js files, multimedia content like image and video files, and data, lots of data. Thanks to CDNs and awesome storage services cloud computing systems provide, static files are not an important part of our problem anymore...

The real thing we do is rendering web pages. Above, we thought of our web servers as very light, generic interfaces to our database back-end. We did not solve how to run applications in our universes yet. Now it is time to do that.

Every request to our system will come to a site and be processed by an application running for that site. The very first thing our web servers will do is to decide which site a request belongs to. On our database server, we keep a table that matches hostnames to sites. With every new client website we have, we will add one or more domains to this table to match this site. With every request to our web servers, we will query the database server and decide which site to load. Good?

No, not good. It is awful. Why?

Big Numbers, Small Numbers

We have a small number of websites; but a very big number of requests. Number of websites on a universe changes much less frequently than other kinds of data like comments on blog sites. This table is updated maybe a few times a day on an established universe. Querying such a tiny (a few thousands records, tiny!) database for every request again and again all day is not smart. The smart way to do this is to keep copies of this table on web servers and update only when they are updated. How do we know when site list is updated? We can keep a row with a number as our table version number. With each update we can increase this number. Or we can keep a time stamp of last update. Web servers can check database server for this number and compare it to their local in-memory versions. If table is newer, we pull the data again overwriting the local in-memory copy. This way, we will have reduced thousands of queries to tiny numbers. Big numbers, small numbers...

At this point, what materials we use on our buildings started to matter. Which languages, what kind of platforms and database systems, etc. Now, they matter because they can make our architecture work better or worse. For example, for the update of table, our database server can have a mechanism to notify web servers about the update. This way, we will have gone even further and completely removed the unnecessary queries of domain-site table. So, if the systems we have chosen provide such mechanisms, it means these systems are good choices for our architecture.

Dividing things in a smart way happens automatically when we understand what we want from our software well. It is very difficult to scale database servers. Since we need data to be together. By increasing the number of web servers, we scale horizontally without any limits; but for database server, this is not applicable. Database server has to keep access to data and machines has limits that we cannot scale in an efficient way.

Every database system provides ways for scaling like sharding or shared-nothing architecture. There can be a time that you have to use these; but as I see on forums, blogs, and other places people share their experiences, IMHO, people use these too aggressively and wrongly. They let their databases grow bigger and bigger then "hey it is time to scale, let's add some shards." 99% of all these applications are blind-running. People throw their problems to software and expect them to be solved like magic. Unfortunately, they realize very soon that there is no magic.

We should keep ourselves from blind-running solutions by watching our numbers: big numbers, small numbers. Also by understanding our system's inner workings and solving problems via architecture instead of heavy use of materials.

Here is an architectural solution: Architected Solution (by Calatrava).

Here are other solutions that depend on materials instead of good architecture: [Blind-run Solution 1], [Blind-run Solution 2]

Judge the difference yourself.

How can we scale a database server? Instead of blindly dividing tables in the middle, we can rethink about our data. Can we separate user account information from site templates? Of course, why not? Can we take keep different database servers for old data and fresh ones? A little more difficult, especially considering search facilities; but why not?

Divide smartly, not blindly! I accept there will be times that you cannot divide it anymore; but come on, how many of us are working for Google or Facebook?

-- Hey man, we have a very large data set and when we run...

-- Shush. First, go back and check your data set. Does it really have to be a large data set?

Most of the time, no, it does not. We just don't want to admit it...

How to Migrate

Rebuilding everything from scratch takes time that many businesses cannot afford. The better way is to architect our current system without rewriting every component; but instead, separating and redefining them as components. This is mostly analysis followed by little changes. Every function call on a system can easily be a point of division. We can just cut the system from that point into two pieces.

A happy look on your current system for just a few hours will show you a lot of ideas on how to divide those pieces. Once you divided them, it is very easy to re-architect everything and then re-build your new system piece by piece. If I have a building and I need a bigger building on the same land, to build the new building without moving all the people already living there is a very difficult job; but not impossible. When it comes to software instead of buildings, it is much easier. So?

Relax

It is software. It is soft. You can copy your data, make tests on it, delete everything, copy a million more times. Once your architecture is designed well, your mistakes never cause catastrophic events. It is very difficult to turn a 6-seat dinner table into one that can serve 60 guests; but software is ... software and we can easily do such things. Relax.

-- The question above touches such an area that is impossible to cover in just a few paragraphs. Based on this part from the question: "However I'm still open to more general do-s and dont-s and pointers suitable for a wider audience." I tried to mention things in a general format without diving into details. Although I tried to give a few tiny examples of practical applications of my principles, I know I have left many open ends in this short text. I appreciate any criticism and questions in comments.

hasanyasin
  • 6,222
  • 1
  • 17
  • 16
  • +1, I like where this is going even though SO frowns upon (?) wordy responses; let me know when you were planning to finish your answer – Oleg Jul 08 '12 at 04:47
  • I will make it clearer today and try to introduce some more specific ideas about your real question: transition from a current system to a scalable one. Thank you very much for reading. Some people do not even read my answers because I write too long. :D – hasanyasin Jul 08 '12 at 06:03
  • I have just updated my answer with the last part. I hope it will be useful for everybody to see things from another one's point of view. If I made any mistakes in my text, I apologize in advance. – hasanyasin Jul 08 '12 at 07:29
  • Your answer is great but it's not always true that architecting an existing system is better than rebuilding from scratch. – Guillaume Sep 28 '12 at 13:35
  • I did not make such a distinction as my main point. My main point is that, solutions to scalability, extendibility or any other problem a system can have are best solved with a good architecture. When going for a new system, it is a business choice how to migrate from current to the new approach. What I tried to say is that it is always possible to migrate in steps. Starting from scratch shouldn't mean forgetting the experience. Any living system provides a very important knowledge base about the problem that the system is supposed to solve; both with its strong and weak points. – hasanyasin Oct 01 '12 at 19:15
3
I'm interested in high level code and architecture changes required.

Unfortunately, there is no "correct" answer to how you should go about altering your architecture. The solution depends on what your current architecture looks like, as well as what your capabilities and preferences are as a developer. Some standalone systems may already HAVE a relatively scalable platform, whereas other may need to make improvements as they start to gain traction, whereas others may need to start over from scratch because their code base is unusable.

A solid code base is EXTREMELY IMPORTANT. Without an efficient and clean code base, it is unlikely that your architecture will ever scale desirably. Many companies make the mistake of putting band-aid after band-aid to solve short-term problems -- but in the long run, this never works out well. When something doesn't work right, take the time to fix it in the most logical way possible -- even if this means adjusting other code in your platform.

The best one can do is give you general advice on building a scalable system, i.e. use caching, design your system to scale horizontally, optimize your database, make sure to use db indexing wherever appropriate. Some best practices are architecture-dependent, but there are some general principles that pretty much every scalable platform needs to follow. For in-depth coverage of good scalability techniques and design patterns, I would check out Scalability Rules: 50 Principles For Scaling Websites.

As far as choosing a platform goes, that is completely up to you and your preferences as a developer. What do you like to code in? C#, Ruby, PHP? Go with the language and platform that your team agrees upon. I prefer Ruby on Rails and I love the MVC design pattern, but that doesn't mean it's the best solution for you. Go with what makes the most sense to you, and work with it to develop a scalable system.

In the past, when I have worked on systems that require high scalability, there has often come a point where I have found the need to start over from scratch. Unfortunately, not everyone has the foresight to know all of the best practices and features that they will require, and often times this results in less-than-ideal database and platform designs. The process of developing a system gives a lot of insight into what is truly required and what the best methodologies are for that system. Thus, there have been a few times in the past where I have been halfway through a product and realized the need for a new code base and so I start over and migrate whatever legacy code I find appropriate for the new design.

Michael Frederick
  • 16,664
  • 3
  • 43
  • 58
  • +1 for *A solid code base is EXTREMELY IMPORTANT. Without an efficient and clean code base, it is unlikely that your architecture will ever scale desirably.*. Couldn't be more true. That is unless you rewrite everything from scratch. – Marcel N. Jul 01 '12 at 21:17
1

My case is somehow different but yet can be one model of transition from desktop directly to saas. My company does Media Monitoring. Lately, we discovered that Media isn't TV and Radio any more, and moved to capturing all the streams from the 'net possible. In order to provide our clients with a solution - they need ARCHIVES not RECORDERS - we simply do hosted solution for them at our servers.

That gave us ability to have everything under one roof, and maintain code base in 'cloud HQ' directly.

There is more to it, but I won't bore the readers, will just point out that we didn't force ourselves to that - it was really organic and perfectly normal.

Architecture of our solution follows one big premise: every hardware node added to the system will be utilized completely, from network to storage to CPU. Every piece of software is written as 'task server' <-> 'agents' that run separately. So you can always buy more machines and run whatever you need at that point in time.

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99
1

Have a read of this: http://static.usenix.org/event/lisa07/tech/full_papers/hamilton/hamilton_html/index.html

Which gives a good overview on building scalable systems. Both from a hardware/software point of view and the human scaling engineer per server point of view.

The book (or kindle book) The Lean Startup (Eric Ries) also has some good pointers.

DarcyThomas
  • 1,218
  • 13
  • 30
0

The processes is basically create a Client/Server interface (http in the web based SAAS cases) based for interact with your data...

If you have direct access to your data in your server you can migrate all or a large part of your software solution to a web based platform like asp.net, php, ruby+rails, python+django, or other. In this alternative you can pass your data in several ways: JSON, oData, Text, embbed in code (objects or arrays), etc. there are very clever solutions to overcome this problems. Generally anvolving AJAX and the ubiquitous XMLHTTPRequest.

If an extreme case where you do not have even the code of the software you are trying to use then, imagine some COBOL or similar solution where you only have the executables... Even in this extreme case you can create a Broker that reads from the screens of your program (auto-It can do it), pass that data to your web interface which will present it at the client endpoint, get back the results from your web program and inject it in the executable interface (auto-it again or some other macro control library).

You don't need to MVC/MVVC/RAILS/DJANGO or other, but they can make your life easier if you don't have some degree of experience in separating actions from data and controlling the flow of web programs.

ZEE
  • 2,931
  • 5
  • 35
  • 47