After a long time of seeing the issue come and go randomly, even when the server is not under load, I am now fairly confident that the cause is RDP remote audio. Even though the profile on the management server, used to connect to other hosts on the network, has remote audio disabled, the profile on my PC that is used to connect to the management server, does have it enabled.
The remote audio feature has always been spotty. Sometimes it works fine. Sometimes it plays audio back with a 3-second delay. Sometimes it starts stuttering. And often, it drops out after a period of no playback - the audio device is still there, but no sound will arrive. Presumably an UDP session timeout. Reconnecting temporarily fixes it, but it also removes and re-adds the audio device, which trips up a bunch of software that assumes that the device is persistent.
This week I've observed a moment where the issue was present for multiple hours. Then my RDP session got briefly disconnected due to network issues. When it reconnected, suddenly the issue was gone. But an hour in, I noticed it happening again. I finally went into my PC profile and disabled remote audio, then reconnected. It's been over a day now, and RDP connections open instantly. The downside is that I'm now without remote audio.
So what I assume is happening is that when establishing a connection, the RDP client on the server is trying to do something with the lost audio session, takes 60/90 seconds to time out, and then I guess temporarily flags it as unavailable. That would explain why I'm not seeing any traffic going between the server and the other host while waiting for this timeout.
EDIT: Further testing has revealed the following behavior:
- If the audio device is idle for more than a few minutes, it malfunctions.
- The malfunction happens on both TCP and UDP so it is not a NAT timeout issue.
- During the malfunction, the speaker sound Test will either indicate volume levels but not make any sound, will stall for 10-20 seconds and then play, or will stall for 1-2 minutes and then display a failure popup.
I have tried searching for "rdp audio stops working", and found serverfault.com/q/1076031 (audiodg stops after 5min idle and has long startup delay), pointing to https://superuser.com/q/994536 (audiodg heavy catroot scan), pointing to https://superuser.com/q/584746. One answer implies that a missing embedded signature on l3codeca.acm is causing a recrawl of System32\catroot (350 MB in 20000 files on 2012R2) everytime the audio device is accessed. If this operation runs at low priority, it would explain the timeouts and playback delays when the cpu is busy.
I have applied the DisableProtectedAudioDG workaround and I'm now seeing that audio continues to work even after long idle periods, and remote connections open instantly even with enabled audio.