I am having trouble with two of my servers that stopped being able to communicate (in a weird way).
Servers are both Microsoft Hyper-V Server 2012 (the ones without GUI).
Name: HVS1
Ip Address: 10.0.0.11
Hosts a VM called servidor
Name: HVS2
Ip Address: 10.0.0.12
Hosts a VM called WMS-1
Each was replicating VM's from the other, this was working fine until about a month ago.
My tests for this question here ALL have these characteristics:
both Firewalls are disabled (with
netsh advfirewall set allprofiles state off
) so I know that these are not firewall issues.I'm always pinging by IP address (although I have
hosts
entries for their names in each server, so it's not a DNS issue)I'm always pinging in both directions, so either both work or neither works. I don't have any cases of pings working only one way.
All hosts are configured to respond to Ping.
Everything is IP v4
Things I've tried:
I can't ping between 10.0.0.11 and 10.0.0.12. This is the basic thing I'm trying to solve, as I expect if I can get this connectivity working, the rest of my problems will go away.
I can ping from their VM's to the host and back. So,
servidor
can ping HVS1.I tried a different hardware switch and it doesn't make any difference.
The higher level services also don't work: Hyper-V manager can't connect between the two hosts, gives an RPC error (RPC Service is running).
RDP into HVS1 works, as long as it's not coming from HVS2, but it is very slow, with very frequent 10 second lags. I don't notice anything else slow in the server.
Ping from my laptop to HVS2 works fine.
Ping from my laptop to HVS1 gives 77% loss. Lots of packets timeout. This explains the RDP lags. Faulty NIC or cable on HVS1, I hear you think? But...
Ping from my laptop to
servidor
works perfectly. Note that this is a VM on the HVS1 host, so it's going through the same NIC and cable as above... So???Ping from HVS2 to HVS1 is 100% loss. The same in the opposite direction.
Ping from
servidor
towms-1
works fine. So VM's from one host to the other can ping, but hosts can't.
So, can someone please explain to me how a connectivity can work across the same physical connection, perfectly in some cases, imperfectly in others, and not at all in others?
And any suggestions for what I can try next? Thanks!
UPDATE - Some extra details requested in comments:
C:\>netsh int tcp show global Querying active state...
TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : enabled
Chimney Offload State : disabled
NetDMA State : disabled
Direct Cache Access (DCA) : disabled
Receive Window Auto-Tuning Level : normal
Add-On Congestion Control Provider : none
ECN Capability : enabled
RFC 1323 Timestamps : disabled
Initial RTO : 3000
Receive Segment Coalescing State : enabled
Looking at my adapters I find something I wasn't expecting - for some reason there seems to be a new name for the adapter there, Ethernet 4
. I don't remember this numbering, it sounds like something got re-done by Windows itself and a new number was given.
PS C:\> Get-NetAdapter
Name InterfaceDescription ifIndex Status
---- -------------------- ------- ------
Ethernet 4 Realtek PCI GBE Family Controller 21 Up
vEthernet (External) Hyper-V Virtual Ethernet Adapter #2 23 Up
It's likely that the changing to this "new" adapter caused the different behaviour in terms of LSO:
PS C:\> Get-NetAdapterLso
Name Version V1IPv4Enabled IPv4Enabled IPv6Enabled
---- ------- ------------- ----------- -----------
Ethernet 4 LSO Version 1 True False False
vEthernet (External) LSO Version 2 False True True
Driver information:
PS C:\> Get-NetAdapter -Physical | fl
Name : Ethernet 4
InterfaceDescription : Realtek PCI GBE Family Controller
InterfaceIndex : 21
MacAddress : 00-14-D1-1D-57-11
MediaType : 802.3
PhysicalMediaType : 802.3
InterfaceOperationalStatus : Up
AdminStatus : Up
LinkSpeed(Gbps) : 1
MediaConnectionState : Connected
ConnectorPresent : True
DriverInformation : Driver Date 2011-10-20 Version 8.1.1020.2011 NDIS 6.30
I tried disabling Lso completely for both adapters, but the problem seems to persist :-(
UPDATE 2: I noticed I had a spare NIC, exactly the same as the one already there, and tried swapping it. Problem persists. I am suspecting the Hyper-V network stack is somehow corrupted...