1

What are my duty in a open university (as a senior PHP developer) to tackling network crashing for increasing accessibility of our web based services? I need big picture.

Mohammad
  • 33
  • 9
  • maybe kind of distributing system, or some thing like that... – Mohammad May 07 '22 at 10:58
  • 1
    You can only achieve a certain availability of a system, and the more you approximate 100%, the more costs are involved for the redundancy of the system components. However, you cannot reach 100% ever :) – Honk der Hase May 07 '22 at 11:06
  • 1
    What does "crashing" mean? Before you start shaping strategies, you need to identify what causes the problem. – Markus AO May 07 '22 at 12:14
  • yes you are right @MarkusAO, but my question is somehow general, I have not so much experience in this context, but I need, so I asked to general guide for beginning. – Mohammad May 08 '22 at 06:42
  • 1
    Have you had crash scenarios? What caused them? Other than that, you need to profile your applications (CPU and memory usage for requests etc.) and project expected usage levels, first of all. Then ensure that your server software (http, database) are configured to match peak load scenarios, and that your server has specs that accommodate this. Since you don't provide details on what sort of PHP stuff you're running, or what sort of hosting environment you are in, it's hard to provide much in way of "general but meaningful for your context". – Markus AO May 08 '22 at 12:01

1 Answers1

3

Here are some things to think about from the application code end to the networking end:

  1. Catch it and Cache it Basically, caching. Use caching (Redis or memcached, +1 for redis) to cache (cacheable/readonly/data that doesnt need to be fresh every second) data wherever you can so you can avoid having to constantly query for or fetch data from a database or other sources, hence reducing potential overloading of resources.

  2. Get in a line! Add incoming requests into a queue and respond immediately to them. Then the clients could follow up on the status of responses via polling, eventstreams/sse, or other forms of client-server communication. The idea here is to use this queue of requests to asynchronously process user requests and push the response back to them. This would require a strong rethink of your architecture, however, to switch to message queueing if you're not already there. You could also just add queueing for heavier requests that might take more time than usual or require a lot of resources to process while leaving everything else synchronous.

  3. Vertically scale up Possibly the simplest (and most expensive) thing you could do is to simply scale your host machines up (in terms of CPUs/cores, RAM, bandwidth available). Note, however, there's only so far you can take it before you really need to think of a better way to handle the load (if you get to that traffic level) in a more distributed manner.

  4. Perfectly balanced, as all things should be You could setup a load balancer (through AWS, Digitalocean, maybe your host provides one if you're not under these) that routes requests between multiple instances of your PHP webserver. In other words, you'll need to run multiple instances of your PHP server. Multiple on the same machine along with instances on other machines. This setup is a little bit more complicated to get right but is a fairly tried-and-tested approach to building a stronger load-balance, distributed setup.

  5. Not my concern! Separation of concerns - if you have a huge monolithic architecture, consider maybe splitting up your application into multiple services in an attempt to attempt a microservice architecture. The idea here is to separate the different gears that could take down your whole architecture from each other into individually-diagnosable units. You can then handle scaling and reliability individually for each of the services/microapplications.

There's other things you can do as well but these are some good thinking points as to how you can prepare your web server to be a bit more reliable.

Another very important thing is Observability/Monitoring. You need really good logging and metrics about your servers in order to be able to pinpoint issues.

When you start getting 1000s of concurrent users hitting your website on a Sunday night, overloading your database and almost taking down your server, time is of the essence. You need to be able to act and react quickly and you can only do that if you have information to act upon.

Having access to logs is important. Also having quick go-to actions you can take is also important (eg. scaling up resources, adding extra machines..etc are temporary measures you can take to keep things standing while you investigate a more permanent solution).

All in all, there's no one size fits all solution. Even huge modern tech companies have products that experience outages now and then. It's almost impossible to guarantee 100% server availability, but you can definitely improve your server reliability and recoverability.

Edit: If you'd like to learn more about general system design, https://github.com/donnemartin/system-design-primer is an excellent resource (the README) to get started!

Azarro
  • 1,776
  • 1
  • 4
  • 11