1

I Currently have a service with a very high traffic (about 1000 connections/second, and this is not reducable with optimization anymore). Until 1 week ago, I was at AWS and had twiched some of my apache/NGNIX configurations to handle that load. There was no issue at all.

I now want to change host and I went with OVH ; the new server config is 4x better than the latter (128GO RAM, 24 Core last gen processor with 30mb cache...)

Now comes the issue ; on the new server I somehow get 503 errors (by apache) as soon as I pass the 600 connections per second. - First of all : Of course I know I must loadbalance the connections and I intend too ; but I want a clean config before i replicate it. - Apache is configured to handle 4000 concurrent connections and it does when I stress test simple

So my Hypothesis : - Either OVH (new host) blocks my internal connections when too often. But they tell me they only block if I go over the 1GB/S bandwith (I don't - far from it) - Either Apache configuration is a bit different and makes server go into 503 faster than before (maby it doesnt like the 0,5 second between connecting to mysql and getting an result). Indeed there is a major difference ; on the new server (Ubuntu) my apache is behind an NGNIX reverse proxy and is in a docker-container whereas before it was a simple LAMP

Does someone have an explanation of what is happening? I am totally lost & depressed.

Thank you so much in advance.

tgogos
  • 23,218
  • 20
  • 96
  • 128
  • Welcome to SO! Sorry to hear you got problems like this. Unfortunately, this is not the right place to ask your question. You might receive more help over at [ServerFault](https://serverfault.com/). However, have you tried [mod_status](https://httpd.apache.org/docs/2.4/mod/mod_status.html)? – Paul May 10 '18 at 10:13
  • Ah, I though by tagging server keywords it would be published on both communities! I will ask there as well and link the answer here if I get it there. I have not tried to to do mod_status before now. But when looking at it, I don't really know what I am looking for. However the list of processes seems never to clean... The SS seems to just grow and grow. See Screenshot: [link](https://imgur.com/a/0Rdoa77) – Bastian Jakobsen May 10 '18 at 10:41
  • It was just an idea, there could have been some obvious abnormalities. So, is there a high load anywhere? Database going crazy because of the amount of requests (some slow queries?)? If there are requests jumping in from every side and it takes much time to finish a request (growing SS), I guess it's obvious, that the available slots are filling up. What did you stress test? Just the server main domain or an actual API endpoint? In case of the first: That would underline my theory. If some unoptimized code slows everything down, it's probably not happening on your main domain. – Paul May 10 '18 at 10:58
  • Thanks for your swift answer. In my case there is no real "main domain" but only API endpoints. When a stress test a simple endpoint (only dumping the content of a file) I can go up to 2000 connections/s. However a endpoint with a SQL select only goes up to 150 connections (lone query takes 0,011s). Finally the worst is when the endpoint is connecting to REDIS (which should handle a very high connection & query rate). But in my logic (i might be wrong) if the problem came from these services, they should have errors ; they don't - only apache has 503s'. – Bastian Jakobsen May 10 '18 at 11:08
  • Maybe we're getting closer to the issue. The question would be what those services are doing on the system. Maybe the [redis article on latency](https://redis.io/topics/latency) helps you to find the cause for redis, which might help to identify the overall bottleneck, if there is one. Nonetheless, I guess the great folks at ServerFault will help you, if my suggestions and ieas don't lead to a solution. – Paul May 10 '18 at 11:52
  • So i read the article about redis and found no issues when testing. I even stress-tested redis directly whilst stress-testing it though apache simultaneously. The direct stress-test was fine with 10k connections/s whilst the "though apache" test was strugeling at 150connections/s. I guess we can eliminate the problem being on the redis server. Can it be a limit on the "out" connections of my apache server ? – Bastian Jakobsen May 10 '18 at 12:51

1 Answers1

0

The answer were the backlog configurations. You can find some on your linux system (on your docker) but also on mysql, on mongodb ect... When you have high traffic, you need to twitch those settings as well.

I also changed the limit of TCP connections ; by default these are limited by Linux.