Skype for Business calls behind pfSense dropping after five seconds

Question

I'm having a problem with Skype for Business (Lync) disconnecting behind a pfSense firewall after exactly 5 seconds. It's an odd problem, and I've opened a ticket with Microsoft, but thus far, that has proven fruitless. Here is the information:

Problem Statement:

If a Lync call originates from inside our office, to a remote VPN user, the call connects with two-way audio, but disconnects after exactly 5 seconds with a "network error." If the remote user calls the office user, no problems.
If an office user tries to present/share their desktop, the presentation starts, then stops after exactly 5 seconds with a "network error." Again, remote users can share to office users with no issues.

Relavent Information:

This is a cloud-based Office365 deployment. There is no on-premise server.
Previously, we had a Fortigate firewall. Lync worked fine.
Our remote users connect using OpenVPN running on a pfSense box behind the gateway.
Split tunneling is enabled, so all Internet-bound traffic from the remote user should go directly to the Internet.
No SIP ALG is, or has ever, been enabled on any firewall.
Our ISP changed our public IP address space. Around the same time, we replaced the Fortigate with a UniFi USG Pro, and it worked for a few months.
With no (obvious) changes to configuration, our remote users (connected via VPN) start having problems.
For reasons unrelated to this problem, we decided the UniFi wasn't flexible enough for our needs. We installed a pfSense server as our primary gateway device.
Now, VPN users connect directly to the gateway, rather than a server inside the network.
With this configuration, everything works fine for a few weeks. We think the problem must have been related to the UniFi.
Then, without any configuration changes, the problem re-appears, and exists now.

Things I've tried:

Changing the internal IP address range for VPN users - No impact.
Set up a VPN server on a different WAN interface, (and different ISP) and had a user connect to that one - No impact.
Tested STUN/TURN from an internal workstation to 52.112.0.75. (Lync TURN server) - Successful
Tested STUN/TURN from an VPN-connected workstation to 52.112.0.75. (Lync TURN server) - Successful
Set firewall to allow all outbound traffic from Office network - No Impact.
Set firewall to allow all traffic from the VPN interface - No impact.
Set firewall to block and log STUN traffic between the corporate network and the VPN network - No impact.

This problem happens 100% of the time, and is easily reproducible. I've gathered network captures from both sides of the call, and sent diagnostic logs to Microsoft. They gave me a huge list of IP addresses to whitelist on my firewall, but I'm not blocking any outbound traffic at the moment. They also showed a log message of a STUN request to 52.112.0.75 failing with a "401 (Unauthorized)" message, however, that failure is followed by a successful STUN binding to the same address, so I think that is a red herring. I've also used a PowerShell script to test STUN from the computer on both sides, and both requests were successful. Also, this error occurs 3.5 seconds into the call, and the successful query comes a few milliseconds later. I think this is an expected failure.

Right around the time the call fails, I do see a few TCP packets with the RST flag set, coming from the same IP block as the other Lync-related traffic. (52.112.0.0/16) Also around this time, I see traffic coming from directly from the remote user's IP address. Specifically, UDP traffic with a destination port of 3478, which is STUN.

Theories:

Each client connects directly to the Lync server in the cloud. After a few seconds, that server tries to negotiate a direct SIP connection between the two clients, (Hence the STUN traffic.) and fails. Direct connection over a VPN is specifically not recommended by Microsoft, so I don't even know why it's trying to do this, and I would prevent it if I knew how.

This problem has been killing me for weeks, and any input would be greatly appreciated.

Have you compared the logs before and after it stopped working? — Davidw, Sep 14 '18 at 04:32
I have no logs from before it stopped working. One of the problems with the Fortinet is it had a somewhat ineffective logging engine. The current logs don't show anything being blocked by the firewall. — C Hamm, Sep 14 '18 at 13:27
Have you tried using the traffic shaper to give priority to Skype calls? — Davidw, Sep 15 '18 at 05:35
It's definitely not a bandwidth issue. We have a reliable 100Mbit up/down fiber connection, and that wouldn't explain why it drops 100% of the time after exactly 5 seconds. — C Hamm, Sep 19 '18 at 13:42

Skype for Business calls behind pfSense dropping after five seconds

Problem Statement:

Relavent Information:

Things I've tried:

Theories:

0 Answers0