Ignite error upgrading the setup in Kubernetes

Question

While I upgraded the Ignite that is deployed in Kubernetes (EKS) for Log4j vulnerability, I get the error below

[ignite-1] Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (54b55de4-7742-4e82-9212-7158bf51b4a9) is not compatible with BaselineTopology in the cluster. Joining node BlT id (4) is greater than cluster BlT id (3). New BaselineTopology was set on joining node with set-baseline command. Consider cleaning persistent storage of the node and adding it to the cluster again.

The setup is a 3 node cluster, with native persistence enabled (PVC). This seems to be occurring many times in our journey with Apache Ignite, having followed the official guide.

I cannot clean the storage as the pod gets restarted every now and then, by the time I get the pod shell the pod crash & restarts.

Can you explain what you're doing to perform the upgrade? From and to which versions? The error message suggests that you're changing the baseline during the upgrade... which isn't right. — Stephen Darlington, Jan 27 '22 at 09:45
There wasn't anything I am modifying on the baseline, I have a helm chart of Ignite, I modify the image tag from 2.10.0 to 2.12.0 and run the helm chart. Do I have to deactivate the cluster or remove the baseline during the upgrade? How do I upgrade the helm chart without downtime of my statefulset Ignite deployment? — vvra, Jan 31 '22 at 06:24
There's no "official" Helm chart for Ignite, so I'm not sure what it's doing. However, you can't do "rolling upgrades" with Ignite, so if that's what it's trying to do it will fail. To move from one version to the next you'll have to shut down the cluster. — Stephen Darlington, Jan 31 '22 at 10:53

score 0 · Answer 1 · answered Jan 31 '22 at 23:53

This might happen to be due to the wrong startup order, starting nodes manually in reverse order may resolve this, but I'm not sure if that is possible in K8s. Another possible issue might be related to the baseline auto-adjustment that might change your baseline unexpectedly, I suggest you turn it off if it's enabled.

One of the workarounds to clean a DB of a failing POD might be (quite tricky) - to replace Ignite image with some simple image like a plain Debian or Alpine docker images (just to be able to access CLI) keeping the same PVC attached, and once you fix the persistence issue, set the Ignite image back. The other one is - to access underlying PV directly if possible and do surgery in place.

I have tried cleaning up the db but the data seems to be empty. The baseline-auto-adjustment could possibly be an issue, I will disable and try it. — vvra, Feb 07 '22 at 04:22

Ignite error upgrading the setup in Kubernetes

1 Answers1