4

I'm working on Azure cloud and trying to implement a scalable HTTP server which is primarily IO-bound.

Clarification: To have an example at hand, assume the server is a "blob proxy" - meaning a client connects, the server downloads a blob from Azure Storage and streams it to the client. That's it.

My goal of course is to squeeze the maximum of concurrent clients from a single machine.

Learning from node.js

It seems like node.js is a very natural fit for this type of problem. All the server does is IO. Node is fully asynchronous and could probably reach a few 10,000s of concurrents on a single machine.

I've ultimately chosen against node.js in favor of c# and the .net framework, and I'm trying to implement a similar strategy with what c# has to offer. I'm not trying to replicate node - only its approach.

My c# implementation on Azure

Currently I have an Azure cloud service containing a thin Web Role on IIS. Here is MyHandler.ashx:

public class MyHandler : HttpTaskAsyncHandler
{
    public override async Task ProcessRequestAsync(HttpContext context)
    {
         CloudBlockBlob blob = GetBlockBlobReference(...);
         await blob.DownloadToStreamAsync(context.Response.OutputStream);
    }
 }

I have this simple routing in my Web.config:

<system.webServer>
  <handlers>
    <add verb="*" path="*" name="MyHandler" type="MyWebRole.MyHandler" />
  </handlers>
</system.webServer>

Discussion:

  1. I expect the new async/await API of .net 4.5 to be just as asynchronous as the JS in node.js and pretty much as easy to use. Any serious mistakes with this assumption?

  2. Which tweaks/optimizations of my Azure Web Role should be made in order to maximize the number of concurrents?

    • I've heard I have to increase max connections since it defaults to only 12*num of cores. Is this true, how is it done exactly?

    • Should I do anything to alter the default thread-pool?

    • Any other important tips/tweaks/optimizations?

  3. Should I drop IIS altogether? There are some "leaner" server implementations which I can host in a Worker Role. Could this significantly improve performance relative to IIS? Is it worth the trouble?

  4. Overall, am I on the right track? Is there a better way to do this with c#/.net?

talkol
  • 12,564
  • 11
  • 54
  • 64
  • 1
    Some are calling this type of implementation node.cs :) – talkol Aug 15 '13 at 19:13
  • IIS/ASP.NET does everything you need natively (asynchronous handling, etc.), don't try to rewrite it "like node.js" in a handler. That being said, you can use node.js on azure if that's what you're really looking for. – Simon Mourier Aug 19 '13 at 06:47
  • @SimonMourier I am relying on IIS and .NET for everything I need, it's just different "packaging". ASP.NET MVC and friends include a lot of bloated features like a view-engine and a routing-engine which I don't want – talkol Aug 19 '13 at 07:42
  • I can understand that, I'm not a fan of bloat either :-) In this case, yes, you can write everything in an HTTP handler. You don't even need the asynchronous version, as the whole ASP.NET IIS architecture is already quite asynchronous (based on thread pools, request queues, etc.). – Simon Mourier Aug 19 '13 at 08:23
  • @SimonMourier I actually think the async API is the game changer here. Traditional thread-pools can't handle 10K concurrents on a single machine :) – talkol Aug 19 '13 at 08:54
  • 1
    I think you shall have a caching logic for the blobs. Just doing `blob.DownloadToStreamAsync(context.Response.OutputStream)` is total waste of time. You are not going to improve the [Scalability targets](http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx), which are 20k transactions per second for blobs, by any value. – astaykov Aug 19 '13 at 21:46
  • @astaykov I wasn't aware of the scalability targets, thanks for that! I've assumed blobs were as scalable as I could get because people are using them as a cdn for static content (like images) and images alone could easily surpass these limitations on a busy website – talkol Aug 20 '13 at 08:52
  • First of all why are you streaming blob? With Azure you are supposed to give Blob URL to your viewer. In case if you want to stream authenticated blob, then you can use signed URL that is unique & expires in given time. You are not using Blob as CDN. Second, node is not faster then IIS in any way. IIS implements many useful functionality such as session, cookies, authentication & many more. Most examples people use to compare node with IIS are incomplete node implementations which will be definitely be faster. – Akash Kava Aug 23 '13 at 06:04
  • @AkashKava Please regard the blob streaming example as what it is - an example. Assume I don't want to redirect to the blob URL itself. The purpose of this example is to compare the performance of various implementations. – talkol Aug 24 '13 at 08:32
  • @talkol, I have answered everything. – Akash Kava Aug 24 '13 at 13:08

2 Answers2

5
  1. Yes, Async API of .NET 4.5 will be as fast as Node will be, however since Node is a bare minimum server, it will always seem faster, as by default Node does not manage session and many other hooks that IIS provides for various functionality. IIS also provides Application Domains etc, to isolate different apps, each of such functionality slows down IIS by few bits, all examples of Node implementations do not cover any high level functionality thats why they seem faster.

  2. You will have to look into config, I am sure it must be configurable, otherwise you can drop Azure Web Role and Move to Azure Web Site and create Reserve Capacity, in which you can completely customize web.config, or you can simply create a VM where you can configure entire IIS at any level. Its your private server.

  3. No, dropping IIS without knowing what all it can do is reinventing wheel, IIS is huge, and it has many addons, and many security improvements. You will end up creating a server with lots of security holes that you may not realize but it could be harmful.

  4. Async API is as far as you can go in C# and along with it, you get many benefits like full featured Visual Studio with Entity Framework and with way more advanced debugging capabilities. With Node or any custom implementation, debugging and creating common routines will eat up all your time. Instead, you can contribute to open source IIS related developments as many of its parts are now available on codeplex.

IIS with ASP.NET 4.5 MVC, its fully asynchronous as well. And you can create asynchronous controllers and handlers. In fact, most people do not know that Asynchronous programming was available since ASP.NET 2.0, the only problem was, it is difficult to write async logic. Comparing cost of development vs time of development, we often ignore benefits of asynchronous programming unless we are paid very high returns of our time involved. But from .NET 4.5 onwards, async and await keywords have made async programming very easy.

Akash Kava
  • 39,066
  • 20
  • 121
  • 167
1

I would seriously suggest not reinventing the wheel here. In the latest versions of IIS, websockets are supported, and async request/response has been available for some time.

Regarding your comments on bloat, you can still be fairly lean. As to the view engine, and routing engine, you don't have to use that, at which point, what exactly are you really wanting? You can register/use a different or custom view engine. The rendering part is such a small level of interaction it's almost not worth mentioning here, especially combined with output caching.

It really depends on what you are trying to accomplish that NodeJS offers. NodeJS's biggest selling point is being able to handle many thousands of simultaneously connected users for heavily IO & event bound loads. Mainly filesystem/db/services for IO and WebSockets for event bindings.

Without profiling a specific application, it's impossible to give advice. Given that there is a lot in the box in ASP.Net + IIS (latest version). If you are going to reinvent the wheel, then it's important to establish precisely what your needs actually are.


In this case, it looks like you are wanting to do CDN type of load in IIS, in which case you should probably use an HTTP module if you are wanting to stick to IIS.

I would suggest you look into a front-loaded CDN or caching reverse proxy if you don't need to limit resources to authenticated users. With the heavy integration of IIS+.Net you aren't going to get around a lot of the bloat you speak of any other way, while sticking to IIS+.Net.

Beyond this, Microsoft already offers a CDN solution backed by azure blob storage. This is probably your best bet, if you need authentication, I believe there are methods for this as well. In any case, you can probably do what you are wanting with .Net and C# either via the ASP.Net MVC or WebService interfaces... no views necessary.


As an aside, I really like node.js myself, and you may well want to use varnish in front of a few node.js instances for your services in question. Azure supports linux VMs, and we've been using Ubuntu and Docker, which has worked pretty well.

Tracker1
  • 19,103
  • 12
  • 80
  • 106