Azure WebSites / App Service Unexplained 502 errors

Question

We have a stateless (with shared Azure Redis Cache) WebApp that we would like to automatically scale via the Azure auto-scale service. When I activate the auto-scale-out, or even when I activate 3 fixed instances for the WebApp, I get the opposite effect: response times increase exponentially or I get Http 502 errors.

This happens whether I use our configured traffic manager url (which worked fine for months with single instances) or the native url (.azurewebsites.net). Could this have something to do with the traffic manager? If so, where can I find info on this combination (having searched)? And how do I properly leverage auto-scale with traffic-manager failovers/perf? I have tried putting the traffic manager in both failover and performance mode with no evident effect. I can gladdly provide links via private channels.

UPDATE: We have reproduced the situation now the "other way around": On the account where we were getting the frequent 5XX errors, we have removed all load balanced servers (only one server per app now) and the problem disappeared. And, on the other account, we started to balance across 3 servers (no traffic manager configured) and soon got the frequent 502 and 503 show stoppers.

Related hypothesis here: https://ask.auth0.com/t/health-checks-response-with-500-http-status/446/8

Possibly the cause? Any takers?

UPDATE

After reverting all WebApps to single instances to rule out any relationship to load balancing, things ran fine for a while. Then the same "502" behavior reappeared across all servers for a period of approx. 15 min on 04.Jan.16 , then disappeared again.

UPDATE

Problem reoccurred for a period of 10 min at 12.55 UTC/GMT on 08.Jan.16 and then disappeared again after a few min. Checking logfiles now for more info.

UPDATE

Problem reoccurred for a period of 90 min at roughly 11.00 UTC/GMT on 19.Jan.16 also on .scm. page. This is the "reference-client" Web App on the account with a Web App named "dummy1015". "502 - Web server received an invalid response while acting as a gateway or proxy server."

See updated in post. And https://ask.auth0.com/t/health-checks-response-with-500-http-status/446/8 — GGleGrand, Dec 24 '15 at 23:44

score 0 · Answer 1 · answered Jan 05 '16 at 13:23

0

I don't think Traffic Manager is the issue here. Since Traffic Manager works at the DNS level, it cannot be the source of the 5XX errors you are seeing. To confirm, I suggest the following:

Check if the increased response times are coming from the DNS lookup or from the web request.
Introduce Traffic Manager whilst keeping your single instance / non-load-balanced set up, and confirm that the problem does not re-appear

This will help confirm if the issue relates to Traffic Manager or some other aspect of the load-balancing.

Regards,

Jonathan Tuliani Program Manager Azure Networking - DNS and Traffic Manager

answered Jan 05 '16 at 13:23

Jonathan Tuliani - MSFT

1,158
7
10

Thanks Jonathan. Good to know that Azure PMs care enough... OK, not Traffic Manager -- was a long-shot suspicion. I de-scaled to single instances to rule out the OAuth/JWT-callback problem and it went away UNTIL THIS MORNING CET when I had the same problem across all 6 farms for about 15min, then it disappeared again as fast as it came. May not have had same cause, of course. Will check with Oauth provider too... – GGleGrand Jan 05 '16 at 13:29
Ok, auth0.com does not report any service outages, which does not rule out other, less obvious, issues. I am trying to find the appropriate azure logfile(s) that can provide me more information. I have logging turned on, although not at the most verbose level to date. Any tips where to look appreciated. I can't change the title of the post, sorry :-) – GGleGrand Jan 05 '16 at 14:25

Azure WebSites / App Service Unexplained 502 errors

1 Answers1