5

We have 5 Windows 2003 R2 SP2 Std x64 terminal servers that are set to reboot each night all within 45 mins of each other. Frequently at least one of them will not respond to RDP requests after reboot. If I connect to console I can login just fine. Netstat shows TS listening on 3389 etc. Only way I am able to get them to respond again is to reboot manually.

All terminal servers show the following errors in the event log after reboots (however not all of them arent responding, most work fine post reboot)

Event ID 5719 - Error - Netlogon - This computer was not able to set up a secure session with a domain controller in domain DOMAIN due to the following: There are no logon servers available.

Event ID 4321 - Error - NetBT - The name DOMAIN :1d" could not be registered on the interface with IP address [IP address]. The machine with the IP [IP address of domain controller] did not allow the name to be claimed by this machine.

However, those events show up on the machines that successfully reboot as well. Can someone please assist me with troubleshooting this problem? Like I said, it doesnt happen every time or on every server. Only sometimes are one or two servers. Very frustrating.

Thanks for any assistance!

floyd
  • 1,530
  • 4
  • 19
  • 30
  • 7
    Why are you rebooting every night? – DanBig Jul 07 '12 at 22:46
  • Unfortunately this is out of my control, I believe it is due to the belief that the servers run out of resources after continued use. Forces all users to disconnect and have all resources opened each business day. – floyd Jul 08 '12 at 05:53
  • You can go into the Computer Management and set timeout limits so that all of the users accounts are logged off automatically after a set amount of time. But @DanBig is right in asking "Why are you rebooting every night"... you shouldn't be. You should fix whatever is wrong, instead. – David W Jul 08 '12 at 22:03
  • I realize that we shouldnt be rebooting every night. Unfortunately as I mentioned this is out of my control. I think its reasonable to expect an OS to be able to reboot nightly without running into these issues as well. – floyd Jul 08 '12 at 22:20
  • I think its reasonable to expect an OS to be able to go witout rebooting every night :). But as you said it's out of your controll. My feeling is just that you should avoid these things because in larger networks the reconnecting and reesablishing of all links can be more troublesome. Maybe you should have a word with the people who are in charge of this. To get back to the Problem: It seems like a problem with your Domain Controller. – Christopher Perrin Jul 08 '12 at 23:00
  • Can you ping the server after it wont RDP? My bet is that its a network issue. – Fergus Jul 10 '12 at 01:15
  • Yes, I can ping the server. I can even use remote mmc's from the other servers and view event logs, query running services etc. When querying service it shows terminal services running, when I use psexec and do a netstat -an, it does not show anything listening on 3389. If I hook a console to the server I can even login to the domain! It's crazy! :) Rebooting again almost always corrects the issue. – floyd Jul 10 '12 at 03:04
  • 1. This might sound trivial, but are the domain controlers or DNS servers also booted every night causing these problems? 2. Can you try to reboot the server in another timeframe (e.g. schedule the reboot 3 hours later on all servers) and see if this still happens happens? 3. Do these servers use DHCP? If so try a static IP instead. – ZEDA-NL Jul 10 '12 at 08:19
  • The DC is not rebooting. The servers and DC are all using static IPs. – floyd Jul 10 '12 at 14:31
  • I know you've said it is out of your control but, Terminal Services Management has the ability to logout disconnected or idle users as well. The options are available in Terminal Services Configuration -> Connections -> RDP-TCP -> Properties -> Sessions. Setting the limits to 4 hours helped keep our 2003 Terminal Server up and happy. OK...all that said, if when the error occurs if you restart "Terminal Services" in services.msc, does the issue go away? It could be trying to start the service before being attached to the domain and barfing. – MikeAWood Jul 12 '12 at 00:47
  • Services.msc does not allow you to restart terminal services in 2003, I have tried however to kill and restart the svchost process hosting termservices, and then restarting termservices and this also does not work. – floyd Jul 12 '12 at 04:02

4 Answers4

2

Sounds like a problem with TS services on the impacted servers. Maybe they're hung, or waiting on a response from the DC that got lost or garbled on the network, or failed to start correctly when the OS booted, etc.

  1. First thing I'd do is set the TS services to delayed start up, in case it's an OS or machine boot issue. It'll set the service to start up after most everything else, so any dependencies should be fully started and there won't be any conflicts with it started at the same time as whatever else.
  2. Failing that, I'd use a scheduled task to restart the service a couple minutes after the OS boots up. (Would take a little bit of guesswork to schedule it right, based on reboot time, machine boot speed and OS load speed.)
  3. Investigate the NICs on the machines? Is it possible that the cause is outdated drivers or firmware, and updated software (like Windows Updates and any other patches you've [hopefully] applied) conflicting with each other from time to time?
  4. Failing that (and maybe anyway, to try to resolve the root cause, rather than just alleviate the symptom), I'd do a reinstall (uninstall, install fresh) of the Terminal Services on the impacted servers. I've had this kind of issue, absent EventID 4321 and that usually resolves it, at least when it's a problem with the TS services on the server, and not caused by networking or domain controller issue.
  5. (Maybe do this before #4) Troubleshoot this from the Domain Controller. There is a reason that the Eventlog is telling you the server can't contact a logon server and the Domain Controller isn't allowing the hostname to be assigned to the indicated interface. This can be caused by domain or Domain Controller settings. Look on the DC to see if there are any indications of that. (Don't forget to look for GPO settings, startup scripts and the like too.)
  6. (Maybe do this before #4 too) Troubleshoot this from a networking perspective. Is it possible the network is occasionally mangling the traffic between these servers and the DC, causing the authentication and name assignment problems you're seeing in the server Eventlogs.
  7. (Maybe do this before anything) Try to convince your bosses (or whoever does "control" the nightly reboots) that the nightly reboots are what's causing this, and/or that this is "expected behavior" when engaging in the dumbassed practice of nightly server reboots. Or you if you fix/fixed it, that the fix will stop working unless the reboots stop or decrease in frequency. You'll get the added benefit of not having to replace your servers in a couple years after the added stress of booting causes a hardware fault. :/
HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
0

I've seen those errors in our Windows 2003 SP2 TS server. It's rebooted every night (just like yours) for legacy app compatibility reasons.

I assume you've already checked this but but in the past I had some harddrive space problems that led us to a similar scenario.

By the way, any of those TS servers is a DC. Aren't they?

beiro
  • 1
0

I found a lot of KB articles from Microsoft that hinted about NIC issues and as you said all servers exhibited the same errors. I think that the problem is in your switches. If you have managed switch you must disable Spanning Tree or Enable faststart (for Cisco).

Here are the commands to do it in a Cisco Catalyst Switch:

config terminal
interface Gi1/0/19
spanning-tree portfast

Note to turn off use the command:

no spanning-tree portfast

For reference Cisco

mgorven
  • 30,615
  • 7
  • 79
  • 122
Kalatzis Stefanos
  • 558
  • 1
  • 3
  • 9
0

It looks like you are not using STATIC IP addresses but it could be an imaging problem. I highly suggest that you set up static IP addresses if you already don't. Each terminal server should have its own static IP address and be manually configured to point to the proper domain controller.

Event ID 4321 - Error - NetBT This specific error could be several things. I have a feeling that the terminal servers you have were imaged. There is a good chance that after they were imaged the virutal network adapter was not re-made; thus, the NIC info is the exact same for all the servers. If they were imaged and if this is the case, I suggest copying what information you have for the virtual adapters (take a photo of it or something... ip address info, dns and wins info, dns suffixes, etc.) and then then re-create the virutal adapter. All the terminal servers could be trying to communicate and the network gets confused... as 3 servers have the same NIC info. Also, be sure to check all your information. Make sure your subnet mask is correct (probably 255.255.255.0).

I have had a situation much like this where I seemed to be able to use the computers directly, but I was not able to remote in. I found that due to imaging the machines, my virtual adapters needed to be re-made. I hope this helps!

Patrick
  • 401
  • 3
  • 5
  • 15
  • I also suggest making sure your SID is not the same for all servers. If you just did a basic image without selecting for a new SID then they will be the same and can also cause errors: http://technet.microsoft.com/en-us/sysinternals/bb897418.aspx This is Microsoft's answer to new SIDs but I suggest researching to find what is best for your environment. The SID might not be the cause BUT if from an image can be helpful to go ahead and fix. It takes only 30 secs or so for the program to run and helps make sure machines on the network are seen as unique and not getting confused. – Patrick Jul 10 '12 at 18:40
  • Static IPs are being used, and the SIDs are not the same. Thanks though. – floyd Jul 11 '12 at 01:02
  • hmmm, the only thing I can think of off the top of my head is the possibily again that the machines were imaged and the virtual adapters were apart of that image. "disconnecting/removing" the virtual adapter and re-adding it fixed the same problem I had. – Patrick Jul 11 '12 at 16:03