1

I'm having a problem with Skype for Business (Lync) disconnecting behind a pfSense firewall after exactly 5 seconds. It's an odd problem, and I've opened a ticket with Microsoft, but thus far, that has proven fruitless. Here is the information:

Problem Statement:

  • If a Lync call originates from inside our office, to a remote VPN user, the call connects with two-way audio, but disconnects after exactly 5 seconds with a "network error." If the remote user calls the office user, no problems.
  • If an office user tries to present/share their desktop, the presentation starts, then stops after exactly 5 seconds with a "network error." Again, remote users can share to office users with no issues.

Relavent Information:

  • This is a cloud-based Office365 deployment. There is no on-premise server.
  • Previously, we had a Fortigate firewall. Lync worked fine.
  • Our remote users connect using OpenVPN running on a pfSense box behind the gateway.
  • Split tunneling is enabled, so all Internet-bound traffic from the remote user should go directly to the Internet.
  • No SIP ALG is, or has ever, been enabled on any firewall.
  • Our ISP changed our public IP address space. Around the same time, we replaced the Fortigate with a UniFi USG Pro, and it worked for a few months.
  • With no (obvious) changes to configuration, our remote users (connected via VPN) start having problems.
  • For reasons unrelated to this problem, we decided the UniFi wasn't flexible enough for our needs. We installed a pfSense server as our primary gateway device.
  • Now, VPN users connect directly to the gateway, rather than a server inside the network.
  • With this configuration, everything works fine for a few weeks. We think the problem must have been related to the UniFi.
  • Then, without any configuration changes, the problem re-appears, and exists now.

Things I've tried:

  • Changing the internal IP address range for VPN users - No impact.
  • Set up a VPN server on a different WAN interface, (and different ISP) and had a user connect to that one - No impact.
  • Tested STUN/TURN from an internal workstation to 52.112.0.75. (Lync TURN server) - Successful
  • Tested STUN/TURN from an VPN-connected workstation to 52.112.0.75. (Lync TURN server) - Successful
  • Set firewall to allow all outbound traffic from Office network - No Impact.
  • Set firewall to allow all traffic from the VPN interface - No impact.
  • Set firewall to block and log STUN traffic between the corporate network and the VPN network - No impact.

This problem happens 100% of the time, and is easily reproducible. I've gathered network captures from both sides of the call, and sent diagnostic logs to Microsoft. They gave me a huge list of IP addresses to whitelist on my firewall, but I'm not blocking any outbound traffic at the moment. They also showed a log message of a STUN request to 52.112.0.75 failing with a "401 (Unauthorized)" message, however, that failure is followed by a successful STUN binding to the same address, so I think that is a red herring. I've also used a PowerShell script to test STUN from the computer on both sides, and both requests were successful. Also, this error occurs 3.5 seconds into the call, and the successful query comes a few milliseconds later. I think this is an expected failure.

Right around the time the call fails, I do see a few TCP packets with the RST flag set, coming from the same IP block as the other Lync-related traffic. (52.112.0.0/16) Also around this time, I see traffic coming from directly from the remote user's IP address. Specifically, UDP traffic with a destination port of 3478, which is STUN.

Theories:

Each client connects directly to the Lync server in the cloud. After a few seconds, that server tries to negotiate a direct SIP connection between the two clients, (Hence the STUN traffic.) and fails. Direct connection over a VPN is specifically not recommended by Microsoft, so I don't even know why it's trying to do this, and I would prevent it if I knew how.

This problem has been killing me for weeks, and any input would be greatly appreciated.

C Hamm
  • 81
  • 1
  • 4
  • Have you compared the logs before and after it stopped working? – Davidw Sep 14 '18 at 04:32
  • I have no logs from before it stopped working. One of the problems with the Fortinet is it had a somewhat ineffective logging engine. The current logs don't show anything being blocked by the firewall. – C Hamm Sep 14 '18 at 13:27
  • Have you tried using the traffic shaper to give priority to Skype calls? – Davidw Sep 15 '18 at 05:35
  • It's definitely not a bandwidth issue. We have a reliable 100Mbit up/down fiber connection, and that wouldn't explain why it drops 100% of the time after exactly 5 seconds. – C Hamm Sep 19 '18 at 13:42

0 Answers0