<property name="shutdownPolicy" value="GRACEFUL"/>
is supposed to protect from partition loss if certain conditions are met:
The caches must be either PARTITIONED
with backups > 0
or REPLICATED
. Check your configs. Default cache config in Ignite is PARTITIONED
with backups = 0
(for historical reasons), so the defaults won't work.
There must be more than one baseline node (only baseline nodes store data!). Here is the doc.
You must stop the nodes in a graceful way. This is a bit tricky since you don't always control this.
- If you stop with a
kill
to the process, make sure it uses SIGTERM
and not SIGKILL
because the later always kills the process immediately
- If you stop with
Ignite.close()
this should just work
- If you stop with Java
System.exit()
it'll work, but if you use System.halt()
- it won't (because halt()
is not graceful)
- If you use orchestrators such as Kubernetes, you need to make sure they'll stop the nodes gracefully. For example, in Kubernetes you normally have to set
terminationGracePeriodSeconds
to a high value so that Kubernetes waits for the nodes to finish graceful shutdown instead of killing them.
- If you use custom startup scripts, you need to make sure they forward signals to the Ignite process.
To debug this, check the points above. I would normally start by looking at the server logs (with IGNITE_QUIET=false
!) to see if "Invoking shutdown hook" message is there. If it isn't there then your shutdown hook isn't getting called, and the problem is one of the points under 3. Otherwise, there should be other log messages explaining the situation.