I'm having a problem with Skype for Business (Lync) disconnecting behind a pfSense firewall after exactly 5 seconds. It's an odd problem, and I've opened a ticket with Microsoft, but thus far, that has proven fruitless. Here is the information:
Problem Statement:
- If a Lync call originates from inside our office, to a remote VPN user, the call connects with two-way audio, but disconnects after exactly 5 seconds with a "network error." If the remote user calls the office user, no problems.
- If an office user tries to present/share their desktop, the presentation starts, then stops after exactly 5 seconds with a "network error." Again, remote users can share to office users with no issues.
Relavent Information:
- This is a cloud-based Office365 deployment. There is no on-premise server.
- Previously, we had a Fortigate firewall. Lync worked fine.
- Our remote users connect using OpenVPN running on a pfSense box behind the gateway.
- Split tunneling is enabled, so all Internet-bound traffic from the remote user should go directly to the Internet.
- No SIP ALG is, or has ever, been enabled on any firewall.
- Our ISP changed our public IP address space. Around the same time, we replaced the Fortigate with a UniFi USG Pro, and it worked for a few months.
- With no (obvious) changes to configuration, our remote users (connected via VPN) start having problems.
- For reasons unrelated to this problem, we decided the UniFi wasn't flexible enough for our needs. We installed a pfSense server as our primary gateway device.
- Now, VPN users connect directly to the gateway, rather than a server inside the network.
- With this configuration, everything works fine for a few weeks. We think the problem must have been related to the UniFi.
- Then, without any configuration changes, the problem re-appears, and exists now.
Things I've tried:
- Changing the internal IP address range for VPN users - No impact.
- Set up a VPN server on a different WAN interface, (and different ISP) and had a user connect to that one - No impact.
- Tested STUN/TURN from an internal workstation to 52.112.0.75. (Lync TURN server) - Successful
- Tested STUN/TURN from an VPN-connected workstation to 52.112.0.75. (Lync TURN server) - Successful
- Set firewall to allow all outbound traffic from Office network - No Impact.
- Set firewall to allow all traffic from the VPN interface - No impact.
- Set firewall to block and log STUN traffic between the corporate network and the VPN network - No impact.
This problem happens 100% of the time, and is easily reproducible. I've gathered network captures from both sides of the call, and sent diagnostic logs to Microsoft. They gave me a huge list of IP addresses to whitelist on my firewall, but I'm not blocking any outbound traffic at the moment. They also showed a log message of a STUN request to 52.112.0.75 failing with a "401 (Unauthorized)" message, however, that failure is followed by a successful STUN binding to the same address, so I think that is a red herring. I've also used a PowerShell script to test STUN from the computer on both sides, and both requests were successful. Also, this error occurs 3.5 seconds into the call, and the successful query comes a few milliseconds later. I think this is an expected failure.
Right around the time the call fails, I do see a few TCP packets with the RST flag set, coming from the same IP block as the other Lync-related traffic. (52.112.0.0/16) Also around this time, I see traffic coming from directly from the remote user's IP address. Specifically, UDP traffic with a destination port of 3478, which is STUN.
Theories:
Each client connects directly to the Lync server in the cloud. After a few seconds, that server tries to negotiate a direct SIP connection between the two clients, (Hence the STUN traffic.) and fails. Direct connection over a VPN is specifically not recommended by Microsoft, so I don't even know why it's trying to do this, and I would prevent it if I knew how.
This problem has been killing me for weeks, and any input would be greatly appreciated.