3

We have a setup as follow on Azure

The problem we have been encountering is that AG is reporting 502 errors on IP1 randomly. In a day it could get about 20 502 errors on some random files. However, these files on IP1's IIS log also shows success in returning data.

IP2 doesn't have this issue at all.

We've tried to cross check the date/time from AG hitting 502 and comparing to IP1's IIS log to find the corresponding request but could not find any. Assumption here is that if the request reached IP1 it would have created an entry in IIS log.

Because the 502 errors are random, and reported only by AG we are not able to lock down the root cause.

Anyone have encountered such issue before or know how to troubleshoot such issue?

Nancy
  • 26,865
  • 3
  • 18
  • 34
K.K
  • 51
  • 1
  • 2
  • 1
    Did you went through all of these troubleshooting steps?: https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-troubleshooting-502 – hujtomi Nov 22 '18 at 22:31
  • Yes, we did and even contacted Microsoft Azure helpdesk. They just mentioned they did a trace and that AG received an ACK from CD1 but timeout after 2 minute which returned the 502 error. We tried to look into the IIS log of CD1 during the said date/time mentioned but couldn't find any such request. There were many other request for the said file but all returned normally. – K.K Nov 23 '18 at 02:02
  • Hey, you guys solved this? We are also facing the exact same issue – Lingaraju E V Jan 07 '19 at 12:36
  • 1
    I believe my team just resolved this. We were running node.js/hapi and if you wireshark the the activity between the web application gateway and the server you'll probably see ACK/RST calls that cause the route to fail and the 502 to occur. We resolved this by adding server.listener.keepAliveTimeout = 120e3; The keepAliveTimeout on your http service (apache,node/express/hapi, nginx) will kill the connection if the client (gateway) does not complete the request within 5 seconds. It took 4 days with MS to fix. Hope it helps you all. – JamesMurray Mar 08 '19 at 22:34
  • Yes we managed to resolve this. It was attributed to a config file error. We corrected the error and the errors went away for us. – K.K May 16 '19 at 10:37

1 Answers1

0

you can try these out :

I understand that the backend Win VM's are running IIS and they are being load balanced by the Application Gateway. Random 502,

  1. Check the HTTP timeout value; ask the helpdesk team to check how long the request takes to be processed and sometimes if the processing time is > than set HTTP time out. Random 502 is experienced.

  2. Check backend health at the time of the issue; If shows unhealthy - check the corresponding reason

  3. Check the failed request count for the Application gateway backend pool during the time of the issue

  4. Check usage, maybe the CPU might be shooting upwards of 90% - possible

  5. Furthermore, you can check the IIS logs for HTTP error logs to check if the IIS is giving the error code 502.

  6. If you have WAF enabled, ask the respective team to check the file size limit for upload and download with WAF on

I strongly think its a timeout on the backend Http settings that's causing this as it's intermittent and IIS shows all 200-399 status codes. Try increasing the timeout and test them.

I hope this helps :)

  • Hi, thank you for your suggestion. Timeout is set at 2 minutes and this particular file have gotten a lot more 200s than 502s. CPU seems normal as well. We have new findings that the request did not complete hence will not be found in IIS log. We've found a "Connection_Dropped" event message is logged in the Httperr.log file. As as found in one of Microsoft's support page saying "A "Connection_Dropped" event message is logged in the Httperr.log file if a user views a Web page, and then the user closes the browser window before the complete response message is sent by IIS." – K.K Nov 27 '18 at 05:09