1

I have a Laravel (Lumen 5.2) project that run against a MariaDB Galera Cluster. When running the app it seems to work just fine. But when I run the PHPUnit tests, they randomly fails.

The problem is that I populate the database and then trying to get the data (ids) to populate other tables with a foreign key. But when trying to get the data immediately after, the data is null.

The Laravel database connection is used with a READ user and a WRITE user. (Laravel automatically uses the correct one when inserting or reading). And I think this is the problem somehow. When I only use the WRITE user, the tests works just fine.

patricus
  • 59,488
  • 15
  • 143
  • 145
emajl
  • 25
  • 5
  • Are you writing to all nodes and reading from all nodes? Or are you 'simulating' master-slave? Either way works, but you need to understand the subtle differences. – Rick James Apr 22 '16 at 21:57
  • Yes, it is wise to have two types of user, but what load balancing / routing / etc is behind each user? – Rick James Apr 22 '16 at 21:58

1 Answers1

4
SET SESSION wsrep_sync_wait = 1;

before the SELECT will guarantee that the writes have caught up.

Although Galera is "synchronous", it is not quite. The writes are guaranteed to be sent to all the other nodes, and that they will work there. However, in the case of a "critical" read, a SELECT can get to the recipient node too fast to see the write. The above setting solves that problem.

Now, let's get real about how you should implement, say, a web site with Galera under the covers.

  • When practical, such as in a transaction, do all the commands against the same node. There is no penalty for this. But, check for errors after the COMMIT.
  • When not practical -- such as starting a new web page from a new HTTP connection -- use that SET.

While there is, potentially, a slight delay in the SELECT to wait for replication, the delay is usually very close to zero. I would suggest that your tests are effectively stress tests will deceptively say that the wait is high. That is, a benchmark is usually designed to find the "worst", not the "typical".

How much delay is there between the write and the failing SELECT? Perhaps only 1ms. How fast can a 'user' post a 'blog', then get to the next page and find it "missing"? Perhaps over 100ms.

Your stress test has discovered the need for the SET, not that Galera is 'broken'.

More Galera Caveats.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks! Now I set the wsrep_sync_wait = 1 like you said when I run my tests and it works. – emajl Apr 25 '16 at 09:53