1

Has anyone noticed that most server crashes at Hetzner occur roughly at 8:00 GMT+3? Like example here What we've got last month at this time:

  1. docker "Segmentation fault"

  2. Network connection dissappered on both inerfaces

  3. Server node down with our virtual server on it

  4. CPU usage become 100% by kworker proccess

With what it can be connected? Mystery or some kind of cloud issue?

1 Answers1

0

Incompetence or bad SLA?

Let's start with bad SLA. You get what you pay for - read the paperwork. Do they guarantee high uptime? It may just be that on what is a down time in their time (basically very very early morning) they do some infrastructure resets. Now, this should NOT be needed generally, but hey, who knows.

Incompetence. Patching etc. should not rese servers and BOTH network connections going down would mean either they are the same physical connection (and you do NOT have two connections outside thevm) or someone decides to reset multiple intances at once, and THAT would be incompetence - you set up a redundant infrastructure only to then reset everything at once.

The core fact is that "not your cloud, not your infrastructure" applies here. Without access to the physical level you simply have no idea WHY this happens and can not handle anything. I Would suggest opening a support ticket as the people there DO have access to the physical level. In this day and age, you should have a 100% uptime on virtualization UNLESS crazy things happen (i.e. a defect). Patching? Move VM's life over to another instance. Do rolling upgrades (i.e. one server out of a cluster at a time). Reset and update network in a way that does not fault it (i.e. redundant hardware, update one, wait, then other). You still do not effectivtly get 100% uptime, but any downtime should be attributed to non standard operations.

Now, if you "own" the server node - then basically that is YOUR fault for not having anything redundant and / or not opening a ticket with the relevant authorities because yes, it COULD be defective hardware - been there, seen that. Again, not your computer, not your access - you need THEIR technician on site.

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • New brave cloud world)) – Vasiliy Shakhunov Sep 13 '19 at 08:31
  • Well, my computers my cloud. That said ,this may not be related. I rent servers regularly (need hardware access for some timing critial software) and I "repeatedly" (i.e. more than once in 20 years) got a server crashing - which went away after a server swap, often with a note that they found defective RAM. It is not unheard of - but the handling here is appaling. For a could, as I said, you can go for 100% uptime outside of defetive hardware EASILY. – TomTom Sep 13 '19 at 08:43