Scalable architecture/application

Question

Nowadays, we have a big noise around being scalable, build an application that can handle millions of the requests. There are many libraries that aim to help you to develop scalable applications, but after all there are just a few ways to scale your application (and the libraries just provide you the wrappers around them):

have a dedicated task queue (it can be an explicit queue, or implicit to the implementation), and have one or more threads which are handling the tasks from the queue
distribute the execution between servers (load-balancing/sharding)

That is it. Is that assumption correct? Or there are other ways to implement "scalable" architecture? The point of the question is to verify that there is just limited set (fundamentally) to scale the applications and the libraries/tools just help you to implement them.

score 0 · Answer 1 · answered Oct 22 '14 at 08:58

In one of previous companies I have been there was such architecture, so

Is that assumption correct?

I think - yes. The company is relatively big and operating with impressive know-how. You can consider one of their most important product architecture, as shown bellow:

                   ______________
               ->    API1: 1000 
                   --------------
______________     ______________      ______________       |    |
LoadBalancer   ->    API2: 1000     ->   CacheServer   ->   | DB |
--------------     --------------      --------------       |    |
                   ______________
               ->    API3: 1000 
                   --------------

score 0 · Answer 2 · answered Oct 22 '14 at 17:59

Scalable architecture needs to address the issue of coupling so common in N-Tier monolithic applications, SOA is attempting to solve this problem.

By distributing, using asynchronous communications (using messaging), slicing your (autonomous) components vertically, not sharing resources (including data) you can achieve scalability and durability, it is a BIG topic...

I'd suggest you read Udi Dahan's blog.

NServiceBus is one solution that is aligned to SOA and can help you get on the way to building scalable durable systems.

score 0 · Answer 3 · answered Nov 03 '14 at 13:19

Going down one step from the load balancer and into the individual servers.

Whatever the application it needs to economize on the number of threads it utilizes such that the OS' scheduler spends as few CPU-cycles as possible determining the highest priority thread. One mechanism for doing this are I/O Completion Ports which are found in Windows and some flavors of Unix. Search for IOCP here on SO.

Economizing on accesses to shared resources - communications, databases, buses, RAM and the L3 cache to name a few - and trying to fit thread and data inside non-shared resources - L2 and L1 caches - results in an application that will be more scalable than if these accesses are ignored. There are many examples of multi-threaded applications running slower than single-threaded ones.

Determining what a SOAP- or an XML-formatted request is supposed to do is very CPU-intensive - the more text the bigger the job. If the application utilizes binary requests it will have more resources left over for performing the request and spend less understanding the request itself. Another aspect of verbose requests and responses are the fact that they gobble up communication bandwidth. A one megabyte response requires roughly ten megabits of bandwidth. That's one tenth of a 100 Mbps connection's capacity during one second. As such it will limit your response capacity to at best 10 responses every second. You want one thousand? You need responses no longer than 10 kB.

If your application is fast enough it will be held up if it needs to go to another server to execute parts of the request. This holds true even for fiber inter-connects. SANs are slower than physically attached storage.

Scalable architecture/application

3 Answers3