1

I am kind of confused about the way real-world web applications are architectured and I would be happy for some clarification . I know that there are many methods and different approaches but an example from a well know site like Facebook/Amazon/Youtube would suffice . Lets say I'm serving web content . I'm assuming I'll have a cluster of web servers and a cluster of DBs, A load balancer infront of them .

My questions are:

  1. How do you store the http code? is it stored locally on each machine or on a shared storage? If locally , how do I update the code if the web site changes

  2. Same goes for static content , I'm assuming it resides on a shared storage

  3. If I'm using a CDN , does it simply cache all static accessed data? and for how long ?

  4. LBs - can I have a cluster of LBs? if so how does it work ?

  5. What DB would you pick for youtube/streaming like site? and why

I know it's a lot of questions , but I'd appreciate it if I could get answers to all. Thanks!

John Doe
  • 495
  • 1
  • 6
  • 12

1 Answers1

3

How do you store the http code? is it stored locally on each machine or on a shared storage? If locally , how do I update the code if the web site changes

There's a number of ways, you can store the code on shared storage such as an NFS/S3 mount - making it very easy to update centrally, obviously you then introduce a single-point-of-failure so people often have two copies of the code on different storage so they can only lose half of their nodes - and you can use this for blue/green-testing/deployment too. Another option would be to store it on a distributed file-system such as Ceph or similar, same caveats hold obviously.

Same goes for static content , I'm assuming it resides on a shared storage

Generally this is the true, a lot of people use cloud-based storage for static content as it's often 'near', from a network perspective, to their CDNs, it's rare to see content stored on web-servers directly these days.

If I'm using a CDN , does it simply cache all static accessed data? and for how long ?

That's certainly the base-functionality, they usually can do a great deal more than that and the TTL is almost always configurable on an individual object/file basis.

LBs - How can it see if a machine is overloaded (load avg) if at all ?

Lots of different ways, open connections, response times, shared resource utilisation stats - LB's can be very 'tuneable', I have a huge amount of respect for good LB managers.

LBs - can I have a cluster of LBs? if so how does it work ?

Yes, literally tiers of them and way you like - as an example we use Global LB'ing to send traffic to a specific datacenter based on a number of factors then once it hits that site it gets split into different service-groups (green/blue for instance) and then to the actual service-LBs.

What DB would you pick for youtube/streaming like site? and why

There's not one best in class DB sorry - there's too many factors, cost being a major one (my word MSSQL and Oracle can get super spendy these days!) but the main thing to consider is if your DB NEEDS referential-integrity as if it does then you need a SQL-based DB (there are free ones though, MySQL and PostGRES are very popular), but if you can design your data right then you can get away with 'NoSQL' databases such as Couchbase/Mongo/Cassandra and they absolutely FLY, so much quicker than SQL for basic queries - but obviously they're less feature-rich. The other thing is that you can do your DB work entirely in the cloud now - AWS in particular have a strong portfolio of DB types and Azure obviously has MSSQL as part of their portfolio.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
  • Great answers @Chopper3 , thank you for taking the time to answer them. One last if I may. Say my web service is globally distributed , traffic is routed to the nearest geo location based on IP. What happens if that geo location is offline ? do you need to route it manually or there's some auto failover? Which products achieve this if I'm building off something of my own and not using aws route53 ? – John Doe Jun 15 '18 at 08:48
  • That's entirely down to your global LB config - we use F5 LBs and NetScalers to do this. – Chopper3 Jun 15 '18 at 09:03