How to simulate Power Failure In Kubernetes

Question

I have my rook-cephcluster running on AWS. Its loaded up with data. Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.

rook is not ok for production. It is very unstable and you can not resize your volume. — yasin lachini, Jul 01 '19 at 15:42
for power failure you can reboot your host. but I think you mean another thing — yasin lachini, Jul 01 '19 at 15:44
I know rook is unstable. By I I'm running rook-ceph. Do you know how to simulate Power Failure?. — Rajat Singh, Jul 01 '19 at 17:04
you should have kubernetes to run rook. there should be a host that kubernetes and therir worker install on it. you should reboot them. — yasin lachini, Jul 01 '19 at 17:24
I'm using Kubernetes only to run my `rook-ceph` cluster. What is host actually? I'm not getting what are you trying to say by that?. — Rajat Singh, Jul 02 '19 at 09:52

score 3 · Answer 1 · answered Jul 01 '19 at 20:35

From Docker you can send KILL signal "SIGPWR" that Power failure (System V)

docker kill --signal="SIGPWR"

and from Kubernet

kubectl exec <pod> -- /killme.sh

and so scriplt killme.sh

beginning of script-----
#!/bin/bash
# Define process to find
kiperf=$(pidof iperf)
# Kills all iperf or command line
kill -30 $kiperf
script end -------------

signal 30 you can find here

score 3 · Answer 2 · answered Jul 02 '19 at 14:29

It depends what is the purpose of your crash test. I see two options:

You want to test if you correctly deployed Kubernetes on AWS - then, I'd terminate the related AWS EC2 Instance (or set of Instances)
You want to test if your end application is resilient to Kubernetes Node failures - then I'd just check what PODs are running on the given Node and kill them all suddenly with:

kubectl delete pods <pod> --grace-period=0 --force

score 1 · Answer 3 · answered Jul 03 '19 at 08:32

Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.

Developers call these unavoidable cases involuntary disruptions to an application. Examples are:

a hardware failure of the physical machine backing the node
cluster administrator deletes VM (instance) by mistake
cloud provider or hypervisor failure makes VM disappear a kernel panic
the node disappears from the cluster due to cluster network partition
eviction of a pod due to the node being out-of-resources. Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.

Developers call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator.

Typical application owner actions include:

deleting the deployment or other controller that manages the pod
updating a deployment’s pod template causing a restart
directly deleting a pod (e.g. by accident)

More information you can find here: kubernetes-discruption, application-discruption.

You can setup Prometheus on your cluster and mesure metrics during failure.

How to simulate Power Failure In Kubernetes

3 Answers3

Linked