5

First time setting up Galera (Ubuntu 14.04) and I'm unable to restart the cluster after rebooting my nodes. I'm following the this guide and it doesn't exactly mention how to handle a shutdown or reboot (for kernel patches, hypervisor updates, routine maintenance).

I found a bit of info here which explains how to find the node which is "safe_to_bootstrap" but I think this is describing a crashed node. The link above describes this as a crashed node with seemingly little luck for recovery:

# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a4f9af07-f235-11e7-a0c0-233dd732dc29
seqno:   -1
safe_to_bootstrap: 1

When I try to start that node however, I get an error in daemon.log:

`WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)`

What's the best way to shut the cluster down and how do I restart it safely? I'm assuming the customary reboot command is not adequte for cleanly shutting down the Galera cluster.

Server Fault
  • 3,714
  • 12
  • 54
  • 89

2 Answers2

4

I know it is late reply to this question. But just in case any one looking for the exact answer about how to safely shutdown and restart the mariadb galera cluster.

For example we have three mariadb galera nodes(1,2,3) running on ubuntu servers. To stop/shutdown the cluster in safe way without destroying the cluster:

  1. Make sure no active transactions or connections against the cluster nodes.
  2. On node3, run the following command to check whether the node is up to date: SHOW STATUS LIKE 'wsrep_local_state_comment'; you should see ' synced ' as return value
  3. run the following command to stop mariadb service: sudo systemctl stop mariadb
  4. On node2 and node1, repeat the same steps, first on node2, and then on node1.

Now you stopped the galera cluster in best way and to start again start from node1 as following:

  1. on Node1 run following command: galera_new_cluster
  2. Then on Node2 sudo systemctl start mariadb
  3. on node3 sudo sytsemctl start mariadb
ender.qa
  • 243
  • 1
  • 7
2

For a graceful shutdown of the cluster, first verify status of your cluster. For each node check the status. Then if the status is synced then you can shut down the node one at a time. What is tricky is when starting the nodes back it needs to re-create the cluster as shutting down destroys the cluster. If all nodes in the cluster are synchronized (that is it contain the same positive "seqno" values) then any node can start the new cluster. If possible I would test this heavily before running on production.

Tux_DEV_NULL
  • 1,093
  • 7
  • 11
  • Thanks. After learning more about Galera, it seems this isn't really a customary service which is started/shutdown by init. Neither of the nodes really knows the state of it's members so shutting one down (as you answered) basically splits the node from the cluster. I've been using `cat /var/lib/mysql/grastate.dat` to determine which node to start the cluster on and so far it's working well. – Server Fault Jan 16 '18 at 17:21