Why is my US central region server struggling (max CPU) behind my Google Cloud load balancer?

Question

I have a load balanced environment configured in Google Cloud Platform. Behind the load balancer are two servers, which are nearly identical in configuration. One server sits in the US eastern region and one in the US central region. The server in the US eastern region easily handles the entire load on its own, averaging about 45% CPU usage. Whenever I add the server in the central region to the load balancer, its CPU suddenly spikes and stays around 99% usage as long as it is connected to the load balancer.

Additional background: The servers are Windows servers running an ASP.NET Umbraco 7 website. I also have two database servers running MariaDB, one the master, and one a replication slave. The eastern server connects to the master (also in the eastern region). The central server connects to the slave (also in the central region).

Can anyone offer an explanation as to why the central server is struggling?

Things I've tried:

I tweaked the balancing parameters to try to get more requests to go to the eastern server thinking that might relieve some of the stress on the central server.
I tried connecting the central server to the master database in the eastern region.
I uploaded a fresh copy of the site files in case there was some corruption somewhere causing a problem.
I've followed Google's (automated) advice and increased the RAM (which wasn't really stressed to begin with; it never got above 50% usage).
I tried spinning up a totally new server, also in the central region, configuring from scratch. Same performance issues.

The best I can figure at the moment is that the server struggles with keeping up with the Health Checker pings, but then why doesn't the other server struggle? Is there something about being in a different region that is causing the issue?

Things I have yet to try. Feel free to suggest a priority on these:

Moving the central server to another region.
Moving the central server into the eastern region alongside the other server.
Adding a CPU

I'm trying to avoid the last one because it seems like treating the symptoms rather than finding the underlying issue.

Well, start doing some analysis. Because MY crystal ball does not have access to YOUR servers. And you provide ZERO relevant information. — TomTom, Mar 13 '18 at 14:13
Sorry, yeah. My servers don't have a crystal ball port. Trouble is, I've been analyzing this for days and nothing has occurred to me. I'm just hoping for some fresh ideas. I'd be glad to provide any info that anyone would find useful. There just seems to be so much that _could_ be useful that I was afraid of ending up a with a book of a question. — George, Mar 13 '18 at 14:47
Well, any info i find useful. PROVIDE IT. Sadly, without going through a checklist step by step - which any decent admin knows how to do - we do not even know which direction to look. This is why people hire admins. We are not here to be your free admin replacement. We ansqwer specific technical questions. — TomTom, Mar 13 '18 at 15:21

score 1 · Accepted Answer · answered Mar 13 '18 at 15:23

To begin with, Google L7 Load Balancers will attempt to route traffic to the nearest backend to the requester. In your case, any request coming from the east coast will go the us-east backend, while all another request from North America will go to the us-central. This is expected behavior.

You can check the L7LB traffic distribution by going to the Management Console > Network services > Load Balancing and click on "advanced menu". From here go to "Backend services" and click on your LB backend. You can now view the RPS per instance within the backend. If you are using 2 separate backends, you can check each one individually.

If the us-central server has a much higher volume, the CPU usage will be higher.

Concerning the health checks, you have full control over the frequency of the checks (ideally it should match that of the us-east server). You can review your health checks in Compute Engine > Health checks, or from the Load Balancer details screen.

You can always increase CPU without increasing memory since that is your issue for the moment. However, that only addresses the symptom not the problem.

The above addresses what to look for on the Google Cloud Platform side of things. If traffic to both instances is about equivalent, start monitoring the performance of your server to check what is maxing out CPU usage to make sure it is in fact IIS and not another application.

Thanks Patrick. The traffic is about equivalent and in fact, the eastern server can handle the entire load without a problem. The central server struggles with a partial load. It is IIS that's bogging down the system, and it is the ASP.NET website's worker process that is causing the load. I will take a closer look at the Health Checker configurations and play with them some to see what happens, but what I keep coming back to is that one server is handling the load fine while the other isn't. Both are getting checked by the same Health Checker. — George, Mar 13 '18 at 15:53
I had been ignoring a separate back end service that had been defined at the beginning of the project, but wasn't in use. I was wrongly assuming that not-in-use, meant can-ignore. Turns out it had its own health checker that was hitting on port 443, causing twice the HC traffic. This was the case for both servers, so I'm still puzzled as to why one struggled so much more than the other, but after deleting that backend service, the load on the central server is much more manageable. I don't think it tells the full story, but I'm not sure that'll be forthcoming without a lot more work. Thanks. — George, Mar 13 '18 at 17:32

Why is my US central region server struggling (max CPU) behind my Google Cloud load balancer?

1 Answers1