I have my rook-ceph
cluster running on AWS
. Its loaded up with data.
Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.

- 6,409
- 1
- 10
- 27

- 653
- 6
- 15
- 29
-
rook is not ok for production. It is very unstable and you can not resize your volume. – yasin lachini Jul 01 '19 at 15:42
-
for power failure you can reboot your host. but I think you mean another thing – yasin lachini Jul 01 '19 at 15:44
-
What do you mean by host?. – Rajat Singh Jul 01 '19 at 17:04
-
I know rook is unstable. By I I'm running rook-ceph. Do you know how to simulate Power Failure?. – Rajat Singh Jul 01 '19 at 17:04
-
you should have kubernetes to run rook. there should be a host that kubernetes and therir worker install on it. you should reboot them. – yasin lachini Jul 01 '19 at 17:24
-
I'm using Kubernetes only to run my `rook-ceph` cluster. What is host actually? I'm not getting what are you trying to say by that?. – Rajat Singh Jul 02 '19 at 09:52
3 Answers
From Docker you can send KILL signal "SIGPWR" that Power failure (System V)
docker kill --signal="SIGPWR"
and from Kubernet
kubectl exec <pod> -- /killme.sh
and so scriplt killme.sh
beginning of script-----
#!/bin/bash
# Define process to find
kiperf=$(pidof iperf)
# Kills all iperf or command line
kill -30 $kiperf
script end -------------
signal 30 you can find here

- 381
- 4
- 9
It depends what is the purpose of your crash test. I see two options:
You want to test if you correctly deployed Kubernetes on AWS - then, I'd terminate the related AWS EC2 Instance (or set of Instances)
You want to test if your end application is resilient to Kubernetes Node failures - then I'd just check what PODs are running on the given Node and kill them all suddenly with:
kubectl delete pods <pod> --grace-period=0 --force

- 4,939
- 10
- 19
Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
Developers call these unavoidable cases involuntary disruptions to an application. Examples are:
- a hardware failure of the physical machine backing the node
- cluster administrator deletes VM (instance) by mistake
- cloud provider or hypervisor failure makes VM disappear a kernel panic
- the node disappears from the cluster due to cluster network partition
- eviction of a pod due to the node being out-of-resources. Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
Developers call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator.
Typical application owner actions include:
- deleting the deployment or other controller that manages the pod
- updating a deployment’s pod template causing a restart
- directly deleting a pod (e.g. by accident)
More information you can find here: kubernetes-discruption, application-discruption.
You can setup Prometheus on your cluster and mesure metrics during failure.

- 6,409
- 1
- 10
- 27