Diagnose why server went down

Question

I have a couple of Asp.Net web apps running a Windows Server 2008 R2 VPS. We have been using this VPS for years. Since the last few months, we have had our apps go down for 30-45 mins. This doesn't happen periodically and this doesn't happen at the same time of day. This must have happened maybe 4 or 5 times in last 2 months. Our analytics dom't report a large numbers of users simultaneously online. We have had more users online and have had no issues.

During the downtime time, we cannot RDP into the VPS. New relic monitoring shows zero activity on any front. After the VPS is back online, the apps work normally. Even after the VPS is back online, nwe relic doesn't show any new entries for that time period. The event viewer also shows no entries during the downtime period. We have the usual entries in the System/Security/Application logs, almost one per minute, till the time the downtime began. And the next entry starts after the downtime was over.

It looks almost as if for that duration, our VPS was put to sleep. I have checked the event viewer for events with ids 6005,6008,6009,6013,1072,1074,1076. I read in various internet posts that these event ids can help identify planned/unexpected shutdowns/restarts. I didn't find any for this time range.

What else can I do to identify why this is happening and to prevent it from happening.

EDIT

This instance of the downtime was due to the host rebooting the physical server. Wrt to the previous downtimes, the host claims to have not been involved. Lets see. I am currently marking @Greg's post as the accepted answer, as that was something I had not considered doing till now.

score 2 · Answer 1 · answered Sep 11 '13 at 14:58

2

Contact your vps provider. Could be hardware failure, network issue, or any number of other things. The loss of remote connectivity suggests the problem is outside your app and probably outside the OS. Your vendor should be able to help you diagnose the issues.... If not, I think you will still have an answer as to what to do about your reliability issues.

answered Sep 11 '13 at 14:58

Daniel Widrick

3,488
2
13
27

We don't have a "managed vps" service with them. Meaning they give us none to very basic stats. When contacted they responded that they didn't do anything and from their end the VPS was running :( – Amith George Sep 11 '13 at 15:32
After the downtime was over, the host posted an update mentioning they had rebooted physical server (some emergency it seems) which causes the virtual machines to be down for 30 mins. Seems there was some confusion as to which physical server my vps was on. Silly. ... Thank you for your answer. – Amith George Sep 12 '13 at 18:36
I'm glad they were able to sort that for you, and hopefully it will remain stable now. – Daniel Widrick Sep 12 '13 at 18:38

score 2 · Accepted Answer · answered Sep 11 '13 at 16:25

What else can you do? Enable ASP.Net Heath Monitoring/heartbeat at one minute intervals. If there isn't a heartbeat, most likely something external to Windows/IIS/ASP.Net.

It seems more likely that your service provider caused the outage than an os defect, which most likely would have zero impact if using multiple servers across multiple hosts/networks.

If you don't have any metrics, measurement, or availability agreement, you aren't going to have much luck attempting to reverse engineer an answer by getting the client os to solve problems of the VPS Provider hosts or network.

Unfortunately, moving apps the "cloud" cannot fix broken or dysfunctional architecture or contracting skills.

The host later posted an update saying they had to restart the physical server due to an emergency. This current downtime was their fault. I will look into Asp.net health monitoring. Thanks for you answer. — Amith George, Sep 12 '13 at 18:38

Diagnose why server went down

2 Answers2