0

I have a weird issue. On a serverfarm I manage, User Profile Disks randomly stop working, causing users who login to loose access to their desktop, including the administrator user.

Server configuration
All servers are virtualized using Proxmox in a clustered setup. The following servers are present:

  • DC: This server primarily functions as a domain controller and fileserver.
  • S: This is an app server that functions as a terminal server, but used specifically for certain apps. This server is not utilized much.
  • TS (6x): 6 different Terminal Servers that are configured identical, running the same software and having the same Group Policies applied. Office and Teams are both installed, and users have E3 licenses and single-sign-on is setup with ADSync.
  • WEB: Broker server that also is being used to start an RDP connection by visiting a web page.

All virtual servers run Windows Server 2016 and are up-to-date when it comes to Windows Update.

Added security: 2FA using DUO mobile for both RDP Web and RDP Gateway sign-in.

The UserProfileDisks are stored on the DC.

The problem
Users login, the broker assigns a server based on least usage and the servers get filled with members. At a random moment, usually in the morning, A server can decide to stop working correctly. The next user that logs in does not get their userprofiledisk loaded, and gets a black screen with only a recycle bin on it and the start menu does not work. We can log out a user from the DC and if the user logs in again, if they get send back to the same problematic server, it happens again. If they get sent to another server, they get lucky and they login normally.

Currently, whenever this happens, we disable login for this TS from the broker server and schedule a reboot. A reboot always fixes the issue, but given that users are able to login before the problem starts happening, we can't just reboot the server at that moment. That said, I want to fix the issue so it does not come back.

When it happens and I log into as Administrator, I get an error message that location C:\Users\Administrator\Desktop is not available. As administrator, I can access the task manager but the start menu does not work. I can press CTRL-ALT-END to log off too.

If I go to C:\Users\ I see shortcuts (symlinks). The one for Administrator does not work anymore at that point. I can delete it manually, or it sometimes gets deleted when I log out, but once deleted, it does not come back. I also see lots of temporary profiles at this point. When things do work, the symlinks are created and I can double click them to access the folder.

I can't find anything useful in the EventLog, but then again, I don't know what I'm looking for, so I may have missed something obvious. I did check all errors though, and did not find anything that could explain what is happening.

EDIT: I have now found an eventID that may be the root cause, but I have no idea why this is happening.

EventID: 158 Disk 14 has the same disk identifiers as one or more disks connected to the system. Go to Microsoft's support website (http://support.microsoft.com) and search for KB2983588 to resolve the issue.

I've looked through all GUID's of the profile disks, but they are unique.

LPChip
  • 333
  • 2
  • 13
  • Are there any open handles for the profiles? – Greg Askew Jul 10 '20 at 12:40
  • No, they close the moment I sign out. When I connect to another server the handle is recreated and it works. If I connect back to the troubling server, it has issues. – LPChip Jul 10 '20 at 14:06
  • How are you validating the handles? – Greg Askew Jul 10 '20 at 14:20
  • On the DC, I go to computer management, open files and check whether there is a lock on the profile disk. But it seems the moment the problem appears, the terminal server is not even loading the profile disks anymore, and no lock happens on the disk either. – LPChip Jul 10 '20 at 16:06
  • I was referring to the local copy of the profile. You can use Sysinternals handle.exe to check for open handles. You can redirect the output to a file and search for references to the problem profiles. https://docs.microsoft.com/en-us/sysinternals/downloads/handle – Greg Askew Jul 10 '20 at 16:15
  • Thanks! I'll check that out monday when I'm back at work. :) – LPChip Jul 10 '20 at 17:28
  • The problematic servers have been rebooted during the weekend, so the problem went away. I have to wait a few days for it to reoccur before I can test. Will keep you posted. – LPChip Jul 13 '20 at 06:39
  • I have a problematic server again. Do I run handle on that server or on the DC that has the profile disk? – LPChip Jul 16 '20 at 08:49
  • I ran handle on both the DC and the terminal server with the ID of the disk that generated the warning. For the DC, it gives me 4 handles, all by the system process. On the terminal server, it gives me 2 handles, both by the system process. The path on the TS shows up as: \Device\Mup\xxxxx\UPD$\UVHD-S-1-5-21-xxxx-xxxx-xxx-1192.vhdx – LPChip Jul 16 '20 at 09:00
  • Hmm, I may have to disregard previous comments. I enabled login on the server to do some more testing and now the problem seem to have vanished by itself. It is possible a troubled user disconnected in the meanwhile. – LPChip Jul 16 '20 at 09:12
  • Hmm... I just created a view based on eventID 158 and I basically see this on all servers, including working servers as a user logs in. It doesn't seem to be this event id. When the servers go into distress again, I'll check the handle of a user I know that won't load. – LPChip Jul 16 '20 at 09:20
  • I have a problematic server again. Using handle, I do get 3 handles on the DC, the TS does not give me any handle. If the user tries to login, their profile disk gets not loaded. I can however login as Administrator. It seems to be selective to what profiledisk can be loaded and what cannot be, but it affects multiple users. – LPChip Jul 17 '20 at 07:15

0 Answers0