There is this really weird issue on the internal network, this happens constantly to "random" machines. Machine (client) boots up, goes to some-intern-website-which-is-not-accessible-from-internet.company-name.tld, timeout, retry, timeout, ping the url/host - machine is alive, go to website again - it loads succesfully. I've triple checked the firewall rules on the gateway (but how can it have anything to do with it when switches directly connect internal computers together?), checked the firewall on webserver - ACCEPT everything. both -S and -L (iptables). tracert uses ping, so the host is alive and well when I try to do that. I'm out of ideas and have no way how to troubleshoot this. VPN users also seem to be affected. How do I approach this issue? The webserver is a debian domu xen vm om xenserver. It started after we moved the company to a new location, thrown away a few old switches, bought new ones. Nothing too fancy. Could it be the switches?
Edit:
Per request I have created a simple drawing of the network infrastructure:
And also pfSense and firewall configurations:
- All sysctl values default
- Hostname Portia
- Domain EHV (same as windows 2012 AD)
- DNS Servers 127.0.0.1, 8.8.8.8, 8.8.4.4, wan gateway
- AD also as LDAP auth server configured into pfSense
- EnableReflectionPureNat: YES
- hadp mode
- re0 - WAN, re1 - LAN (see image for ip+subnet)
- WAN -> blockbogons only
- DNS on LAN's (and VPN): Range from 10.x.0.1 to 10.x.255.255 with gateway 10.0.0.1 and wins server 10.0.0.2
- x - LAN = 1, VPN = 2
FW Rules:
No floating rules (0)
NAT Rules:
1:1 and NPt empty
Outbound default/auto
Aliases:
Restarting everything doesn't seem to help.
Aquired packet logs with tcpdump and wireshark on both client and server, client sends syn, server wants to send ack, ack doesn't get to client. server keeps retrying. server can't also ping client. if client pings server, server can communicate with client for an unknown (longer than 1H or as long as the machine is up?) amount of time. Some other pc's (particularily behind other switches) cannot ping client too.
Edit 2: this seems to be happening for all tcp connections (as I just had the same with SSH).
Edit 3: I've managed to somehow isolate the issue, I think?
I changed the ip from a pc from 10.[not zero].x.x to 10.0.x.x and ... behold it works! Why? Why can't we have a 10.1 10.2 etc network until the machines ping each other? How can I determine the culprit?