0

We have 2 instances of the same galera cluster, so the same database schema with a data difference of 1 month. respectively they are a staging and a production environment.

They run on an identical hardware configuration, same operating system, same galera version, same configuration as a 3 node configuration. The network latency is a non-problem each node resides on a different docker container on the same machine and they are regularly backed up.

There is a particularly heavy query that in the staging env takes up to 2 seconds to run, while on the production env actually gets up to 2 minutes and half. Examining the query plan the magnitude of the number of records difference for each step is not that far apart, but the actual query plan varies in strategy in the 2 following steps:

staging (2s):

8   DERIVED b   ref bay_depend_id,bay_relies_on_id  bay_depend_id   4   rwms.l.bay_id   2   Using where
8   DERIVED b   ref bay_depend_id,bay_relies_on_id  bay_depend_id   4   rwms.l.bay_id   2   Using where

production (2mins):

8   DERIVED b   ALL bay_depend_id,bay_relies_on_id              1   Using where; Using join buffer (flat, BNL join)
8   DERIVED b   ALL bay_depend_id,bay_relies_on_id              1   Using where; Using join buffer (incremental, BNL join)

The query plan for all the other steps of the query is actually the same with the difference being in the number of records, but at these 2 steps the actual evaluation strategy changes. And I can't really wrap my head around it, or on how to resolve this issue, mostly because every working condition is radically the same.

urobo
  • 786
  • 2
  • 8
  • 19

1 Answers1

0

How big are those tables? (On each node.)

EXPLAIN is not consistent. Your example of such is rather drastic and deserves more investigation. Please provide the query, table size(s), and EXPLAIN FORMAT=JSON SELECT ... Also SHOW CREATE TABLE.

Doing this may get them in sync:

 ANALYZE TABLE ...

It is fast and rebuilds the statistics for the table given. Note: There is no sharing of statistics between nodes, so do that on each node.

Rick James
  • 135,179
  • 13
  • 127
  • 222