SET SESSION wsrep_sync_wait = 1;
before the SELECT
will guarantee that the writes have caught up.
Although Galera is "synchronous", it is not quite. The writes are guaranteed to be sent to all the other nodes, and that they will work there. However, in the case of a "critical" read, a SELECT
can get to the recipient node too fast to see the write. The above setting solves that problem.
Now, let's get real about how you should implement, say, a web site with Galera under the covers.
- When practical, such as in a transaction, do all the commands against the same node. There is no penalty for this. But, check for errors after the
COMMIT
.
- When not practical -- such as starting a new web page from a new HTTP connection -- use that
SET
.
While there is, potentially, a slight delay in the SELECT
to wait for replication, the delay is usually very close to zero. I would suggest that your tests are effectively stress tests will deceptively say that the wait is high. That is, a benchmark is usually designed to find the "worst", not the "typical".
How much delay is there between the write and the failing SELECT
? Perhaps only 1ms. How fast can a 'user' post a 'blog', then get to the next page and find it "missing"? Perhaps over 100ms.
Your stress test has discovered the need for the SET
, not that Galera is 'broken'.
More Galera Caveats.