MariaDB Galera Cluster Rolling Upgrade - Replication Warning WSREP tx protocol version

Question

I am upgrading a MariaDB Galera cluster from MariaDB 10.1.44 up to MariaDB 10.4.13 on Red Hat. I upgraded 1 member of the cluster and of course ran mysql_upgrade --skip-write-binlog when the software was finished updating. I then re-added it to the cluster.

On first glance, it appears that the node is synced to the cluster. show status like 'wsrep_local_state_comment' is set to Synced, show global status like 'wsrep_cluster_status' is set to Primary, and show status like 'wsrep_evs_state is set to Operational.

However, whenever we start to send traffic to the node, it appears to fall on its face. It temporarily gets out of sync from the rest of the cluster. But then as soon as we stop sending traffic, it starts showing as synced again.

Checking logs, it appears that the following Warning occurs on every transaction:

[Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1

SHOW GLOBAL STATUS LIKE 'wsrep_protocol_version' currently has a Value of: -1.

If I go run that same SHOW GLOBAL STATUS LIKE command on one of the cluster members that has NOT been upgraded and is still running on MariaDB 10.1.44, I see that wsrep_protocol_version is: 9.

Checking a healthy MariaDB 10.4.13 cluster in my lab environment where all 3 nodes are running on MariaDB 10.4, I see that their wsrep_protocol_version value is: 10.

I have read here and here that its a good idea to set the following in the appropriate my.cnf file while the upgrade is taking place. This forces the "EVS Version" to remain compatible with the older nodes of the Galera cluster until Galera can be upgraded on the remaining nodes: wsrep_provider_options="evs.version=0" (See also: MariaDB evs.version)

However, that doesn't appear to fix my problem. Literally EVERY transaction appears to be producing the WSREP: trx protocol warning, i.e. I'm seeing several lines hit the log file per second (this is a busy cluster).

Is there any way to fix this issue without taking down the cluster, or will I need to take down the entire cluster and upgrade all nodes in the cluster, before bringing it back online?

MariaDB Galera Cluster Rolling Upgrade - Replication Warning WSREP tx protocol version

0 Answers0