2

We suffer from a connection refused problem when the users of our web site try to open it. This problem happens in a random manner, about once or twice a month, and problem continues for a few hours. Also when happening, almost all connections are rejected by connection refused error. but there are successful connections meanwhile.

  • OS: Win 2012 R2 Standard hosted on ESXI 6
  • IIS 8.5
  • Web server is hosting an ASP.NET application.
  • Windows Firewall is on.
  • Average current connection on server: ~3500 (based on Web Service\Current connection performance monitor counter)
  • Total RAM: 40GB
  • CPU: 30 cores, 2.30 GHz

There is plenty of RAM (more than ~60%) and CPU (more than ~70%) available while this problem happens. Also we checked the network firewall and apparently traffic is passing through network firewall without problem and problem happens at the server level. And we can not even open the web site by doing Remote desktop and trying to open in locally.

We checked about exhausted port problem and it seems that is not the problem. The number of SYN packets are high, but its similar to other days when everything is fine.

This is one day summery of HTTPERR log:

s-reason    COUNT(ALL *)
Timer_ConnectionIdle    462040
Timer_MinBytesPerSecond 27555
Request_Cancelled   1757
Timer_EntityBody    428
Forbidden   247
URL 130
Hostname    117
BadRequest  102
Connection_Dropped  96
Client_Reset    88
Connection_Dropped_List_Full    40
Verb    10
Header  7
Connection_Abandoned_By_ReqQueue    1

Any help is really appreciated to find the reason why we get connection refused when trying to open web site hosted on this server.

1 Answers1

0

Are you running in a virtual environment, or a physical machine? (Edit, just re-read, and saw ESXI 6. So Virtual it is then.)

You've got a VMware VM, are you using the standard install NIC, or are you using the VM providers specific NIC? (ie: Intel vs. VMWare)

We have a similar issue, but is much less persistent when it occurs. (Ours is exposed when an automated script runs to check 30 sites up/down status, but only effects a half dozen LWP gets.)

(Sorry, can't use comments yet, don't have the rep for it)

Edit 2

As per this TechNet Link Mohammad found, SYN Attack Protect is on by default in >= Vista. Which is what I found yesterday, but unlike what I read yesterday, the RegEdits in Edit 1 apparently don't make it more aggressive or active.

I've taken a temporary approach of blocking IPs at the firewall to see what happens. Excess SYN_RECEIVED connections drop, and then eventually rise again on another IP (As you would expect).

Edit 1 - [Possibly debunked?]

If you haven't got to reading all the comments below, it looks like this is headed in the direction of a SYN Attack (for both of us).

I'm currently trialling the following changes in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ on a development server for testing:

SynAttackProtect: https://technet.microsoft.com/en-us/library/cc938202.aspx TcpMaxConnectResponseRetransmissions: https://technet.microsoft.com/en-us/library/cc938208.aspx TcpTimedWaitDelay: https://technet.microsoft.com/en-us/library/cc938217.aspx TcpMaxHalfOpenRetried: https://technet.microsoft.com/en-us/library/cc938213.aspx TcpMaxPortsExhausted: https://technet.microsoft.com/en-us/library/cc938214.aspx TcpTimedWaitDelay: https://technet.microsoft.com/en-us/library/cc938217.aspx

Somewhat like what is detailed here: https://alnitech.com/news/how-to-protect-your-windows-server-from-syn-flood/

Useful note - some TCP / UDP Port usages. Especially if you're considering increasing ports in the ephemeral range. (ie: netsh int ipv4 set dynamicportrange tcp start=45536 num=20000) https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers

Mark
  • 147
  • 8
  • We use Intel NIC provider. Do you mean you face "Connection refused" when you run a perl script which uses LWP to get IIS site status? – Mohammad Reza Sadreddini Sep 04 '17 at 04:20
  • Yes. It has become much more prevalent lately. And no sooner had I written the above, I got a list of about 20 Connection refused from LWP. I Can only assume it's stemming from the same thing, as all the wisdom I've read says that the feedback Perl's giving is straight from the destination server. Although, the IIS logs and the FailedHTTPReq logs don't show anything of note. (As an aside, I've set up Verbose FailedRequestLogging to see if this will shed any light on this.) – Mark Sep 04 '17 at 04:32
  • Getting back to the point, there is a VMWare version of the NIC, e1000 I think? ... although, it might be the VMXNET3 NIC. You'd need to seek information from your host provider from here. I've reached the edge of my knowledge. *(Note: Split comments in to two for ease of readability)* – Mark Sep 04 '17 at 04:35
  • I think I remember now, the e1000 IS the Intel one, And the VMXNET3 is the one you want (the VMWare One) – Mark Sep 04 '17 at 05:14
  • We have this problem on all our servers, I check one of them is using VMXNET3 and problem still exists. We are struggling with problem about 2 years... Next time this problem happens, we are going to use Microsoft Network Monitor to trace packet in order to find a clue. – Mohammad Reza Sadreddini Sep 04 '17 at 09:24
  • I think we're also seeing this in other aspects. Our DB Server (MySQL) and our Web Server (IIS 8.5) are regularly getting short bursts of non-connectivity. Resultant errors include: `"MySQL server has gone away"`, `"Lost connection to MySQL server during query"` and `"Can't connect to MySQL server on '' (10060)"`. (All preceeded with `[MySQL][ODBC 5.2(w) Driver]`.) – Mark Sep 05 '17 at 00:14
  • It's been implied that this might be caused during the host migration that the hypervisor does to maintain balance in the force. (where Force = VM platform resources) – Mark Sep 05 '17 at 00:15
  • We don't use VM Cluster so we don't have any migration scenario. Interesting thing is that we can not even open web site on server itself when problem happens. We also tried to open web site via localhost, but we get random connection refused error. – Mohammad Reza Sadreddini Sep 05 '17 at 05:29
  • We caught one in the wild today (vs. an answer to a scripted check). Which was out of the ordinary. Ours don't usually last long enough for us to double check via a browser. And in this wild case today, it didn't last long enough for us to VPN to the server and check it from localhost. (Mind you, we have 100's of sites, localhost isn't bound to anything.) – Mark Sep 06 '17 at 00:11
  • One thing I've been meaning to ask you is, do you have Dynamic IP Restrictions set up? `[Site or Server] > IP Address and Domain Restrictions > [Right hand col'] Edit Dynamic Restriction Settings.` If you do, check the box `Enable Logging Only Mode` to see if this makes the issue "go away". – Mark Sep 06 '17 at 00:13
  • No we don't use IIS Dynamic IP Restriction, We use similar rule on our firewall before web server, and nothing special is reported by firewall when this problem happens. Very interesting point is that even sites without any active connection are affected by this problem. and I think problem is not related to heavy load. – Mohammad Reza Sadreddini Sep 06 '17 at 03:04
  • Yesterday problem was happening all day long and I think we finally found at least one causes of problem after 2 years struggling! We checked SYN packets again on server and realized there are unusual number of SYN_RECEIVED. Then we blocked the source IP of syn packets. Dear Mark, could you please check syn packets by running `netstat -nao | findstr SYN_RECEIVED` from command prompt when problem happens? You should see a handful SYN_RECEIVED even on heavy loaded web server. High number could be symptom of problem. – Mohammad Reza Sadreddini Sep 06 '17 at 03:12
  • You'll never guess what. There was a mass of SYN_RECEIVED entries. All from an IP Address that traces back to Contabo GmbH in Germany. I believe they call it: Port Exhaustion (or maybe Ephemeral Port Exhaustion). Will look into this further, and post back. Nice catch Mohammad :) – Mark Sep 06 '17 at 06:27
  • FYI: SysInternals TcpView ( https://docs.microsoft.com/en-us/sysinternals/downloads/tcpview ) will allow you to close connections manually. Answer by Robert here: https://serverfault.com/questions/710774/port-exhaustion-and-iis-8-on-server-2012-r2 enables you to expand the number of TCP and/or UDP ports you have available to be used in this situation. (**Note:** I'm thinking this would allow more concurrent traffic, so just be mindful of that.) – Mark Sep 06 '17 at 07:02
  • Hi - I have updated my answer with more information - found when digging into this whole SYN Attack business. Read the edit for more info, but it seems that Windows has built in SYN Attack protection. – Mark Sep 07 '17 at 05:08
  • Based on [link](http://blogs.technet.com/b/nettracer/archive/2010/06/01/syn-attack-protection-on-windows-vista-windows-2008-windows-7-and-windows-2008-r2.aspx) Syn attack protection can not be disabled on Win 2012. Also this problem does not have the symptom of SYN Flood attack.We are trying to investigate more and keep posted as we got any update. – Mohammad Reza Sadreddini Sep 07 '17 at 05:28
  • I had read things along the lines of "it's always on" but also read that updating/adding `SynAttackProtect` to 1 made it more aggressive. Interestingly, I also read there was a lot of misinformation out there on this topic. I will continue digging at my end. – Mark Sep 08 '17 at 06:22