2

I'm looking to build a High Performance website. It has thousands of static HTML pages, that are specifically rendered depending on a form submit. I have a ruby script that generates these static HTML Pages and stores them on the server.

Now I'm looking at 1000+ concurrent users on the site. Which is the fastest way to serve these users. I believe Nginx + Varnish can do an extremely good job for this kind of a scenario. Are there any further optimizations that I can do?

Is there a way instead of NGinx + Varnish hitting the disk for the HTML pages, it hits the RAM instead. Using Memcached somehow.

I'm already considering moving out the other static assets likes Images/Stylesheets to a CDN. Please advise on what's the best way to go about this.

Thanks!

[Reposted from StackExchange : https://stackoverflow.com/questions/6439484/building-a-high-performance-static-website]

Karthik Kastury
  • 133
  • 1
  • 5
  • If you're entire site is static, why not pull everything through a CDN? It'd be faster for your users and your server wouldn't have to worry about it's own caching etc. you could just spit out the static files with something simple like Apache and let the CDN worry about the upstream caching –  Jun 22 '11 at 18:04
  • So I could simply host my Files on Rackspace CDN or Amazon Cloudfront, and have it do the hard work? – Karthik Kastury Jun 23 '11 at 03:50
  • I'd suggest that's the easyest option. You could even store the files on Amazon S3 now they let you select an index document http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?IndexDocumentSupport.html –  Jun 23 '11 at 10:12
  • @samarudge But would that option support URL Rewriting? Like I don't want .html in the URL. – Karthik Kastury Jun 24 '11 at 04:02
  • Unfortunately not, no. If you wanted URL rewriting you would need to do that at your origin, both NGINX and Apache support rewrite rules, expires headers and etags so I'd suggest you look into one of those. I'd guess that, providing you set your headers correctly, a single Amazon EC2 `small` instance could handel all your origin needs. –  Jun 24 '11 at 07:33

4 Answers4

4

While varnish is absolutely fantastic with it's flexible VCL, it's really more suitable for caching dynamic websites. There seem to be general consensus about nginx outperforming varnish (atleast on small static objects).

You can either proxy_cache, fastcgi_cache or simply serve from disk directly using nginx. I know it do support memcached, but the only benefit with memcached would be if you have multiple servers sharing the same cache - apart from that I can only see an extra overhead.

You could either let your filesystem (and hopefully raid controller) cache the (most used) data, or just stick it into a ramdisk!

I am confident that a pretty budget xeon server with a few GB of ram easily will handle a few thousand requests per second, given that you're really only serving static content. I also think it might be possible to pre-compress all static content so you won't add that extra ovearhead for every request.

3molo
  • 4,330
  • 5
  • 32
  • 46
2

Nothing really beats just pushing files to the client. If you can do this I would try really hard to kep doing that as it will easily scale and doesn't require fancy caching but merly just shipping the prepared stuff to the client. And Nginx is particularly good at that task. Due to its event-based processing model it will use magnitudes less resources and will probably be much faster compared to the thread-based varnish when you have many simultaneous connections.

Also basic filesystems are really good at caching files for repeated reads. Thus, I would think you are just fine with a simple server without much required CPU power but enough memory for the filesystem cache a good I/O subsystem. The optimal system would be able to cache all needed objects in RAM, so you should be fine with a system roughly the size of (size-of-working-set + 1GB). Depending on the size of your objects, I'd also strongly recommend using SSDs at least as an intermediate caching layer to the rotating rust so that objects not already in the RAM can be quickly retrieved. The I/O pattern you will observe will probably be rather random, which is something SSDs can fulfill easily but at which spinning disks are rather bad at.

Generally, an important meassure are not the number of users surfing your site, but the actual number of concurrent TCP connections to your server and the needed throughput. With some tuning, you can easily fill a gigabit pipe with a system costing about $3000.

Holger Just
  • 3,325
  • 1
  • 17
  • 23
  • Will Nginx automatically move resource, that are repeatedly requested to an in memory cache? – Karthik Kastury Jun 23 '11 at 03:50
  • 1
    @Karthik: That has nothing to do with Nginx but is a feature of about every filesystem. Those typically use all available (free) RAM to cache stuff they have read so that future reads do not have to go to the rotating rust. Every application takes advantage of this. – Holger Just Jun 23 '11 at 09:17
0

A more standard way to do this would be to generate the results on the fly and then serve subsequent requests out of some sort of cache.

You could do this in code with something like redis or memcached (See a basic python memcached example at https://stackoverflow.com/questions/868690/good-examples-of-python-memcache-memcached-being-used-in-python). You could also just use a caching proxy like nginx to try to do this automatically.

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
  • I agree with Kyle, and you could use either of varnish and nginx for this. Seeing how simple this task is I would go with nginx. – 3molo Jun 22 '11 at 17:52
0

(low-quality answer incoming, others know this stuff better than me):

I am pretty sure that nginx in combination of blazing I/O (RAM+SSD) will easily serve 1000 clients with pure static html content.

pauska
  • 19,620
  • 5
  • 57
  • 75