1

I got Galera master-master cluster on MariaDB 10.4 and galera-4 as replication lib. It runs perfectly until some network issues happens between some node and the others. After node reconnect and resync it's state the whole cluster performance drops down drastically. And the only way to restore performance is to rebuild the whole cluster from a donor node.

This is what I got in log:

    2023-03-19  1:23:20 0 [Note] WSREP: ####### Adjusting cert position: 61149358 -> 61149359 
    2023-03-19  1:23:20 0 [Note] WSREP: Service thread queue flushed. 
    2023-03-19  1:23:20 0 [Note] WSREP: Lowest cert index boundary for CC from ist: 61149189 
    2023-03-19  1:23:20 0 [Note] WSREP: Min available from gcache for CC from ist: 60988106 
    2023-03-19  1:23:20 0 [Note] WSREP: Receiving IST...100.0% (545/545 events) complete. 
    2023-03-19  1:23:21 2 [Note] WSREP:
    ================================================ View:   id: 6caf7137-be5b-11ed-952d-8e2998c314a7:61149359   status: primary   protocol_version: 4   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO   final: no   own_index: 2   members(4):
            0: 38ac2156-c3e1-11ed-b954-ba79e6d7687d, DB1
            1: 637c05dc-c17f-11ed-b3f2-0e364b289bef, DB3
            2: 667386bc-c5db-11ed-ad68-ee03169ac5b9, DB4
            3: c9c34048-c3e0-11ed-8cbc-13957f2241e4, DB2
    ================================================= 
    2023-03-19  1:23:21 2 [Note] WSREP: Server status change initialized -> joined                 
    2023-03-19  1:23:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:21 2 [Note] WSREP: Draining apply monitors after IST up to 61149359 
    2023-03-19  1:23:21 2 [Note] WSREP: IST received: 6caf7137-be5b-11ed-952d-8e2998c314a7:61149359 
    2023-03-19  1:23:21 2 [Note] WSREP: Lowest cert index boundary for CC from sst: 61149189
    2023-03-19  1:23:21 2 [Note] WSREP: Min available from gcache for CC from sst: 60988107 
    2023-03-19  1:23:21 0 [Note] WSREP: 2.0 (DB4): State transfer from 3.0 (DB2) complete. 
    2023-03-19  1:23:21 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 61149618) 
    2023-03-19  1:23:21 0 [Note] WSREP: Processing event queue:...  0.0% (  0/260 events) complete. 
    2023-03-19  1:23:32 2 [Note] WSREP: Processing event queue:... 56.8% (208/366 events) complete. 
    2023-03-19  1:23:40 0 [Note] WSREP: Member 2.0 (DB4) synced with group.
    2023-03-19  1:23:40 0 [Note] WSREP: Processing event queue:...100.0% (402/402 events) complete. 
    2023-03-19  1:23:40 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 61149756) 
    2023-03-19  1:23:43 2 [Note] WSREP: Server DB4 synced with group.
    2023-03-19  1:23:43 2 [Note] WSREP: Server status change joined -> synced 
    2023-03-19  1:23:43 2 [Note] WSREP: Synchronized with group, ready for connections 
    2023-03-19  1:23:43 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 
    2023-03-19  1:23:59 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61150337, -110 (Connection timed out) 
    2023-03-19  1:24:15 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151132, -110 (Connection timed out) 
    2023-03-19  1:24:17 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151205, -110 (Connection timed out) 
    2023-03-19  1:24:23 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151478, -110 (Connection timed out) 
    2023-03-19  1:24:33 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61151724, -110 (Connection timed out) 
    2023-03-19  1:24:38 0 [Note] InnoDB: Buffer pool(s) load completed at 230319  1:24:38 
    2023-03-19  1:24:41 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152114, -110 (Connection timed out) 
    2023-03-19  1:24:43 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152261, -110 (Connection timed out) 
    2023-03-19  1:24:46 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61152364, -110 (Connection timed out) 
    2023-03-19  1:27:32 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158126, -110 (Connection timed out) 
    2023-03-19  1:27:36 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158269, -110 (Connection timed out) 
    2023-03-19  1:27:40 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158385, -110 (Connection timed out) 
    2023-03-19  1:27:47 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61158486, -110 (Connection timed out) 
    2023-03-19  1:28:12 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61159789, -110 (Connection timed out) 
    2023-03-19  1:28:15 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61159889, -110 (Connection timed out) 
    2023-03-19  1:28:18 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61160056, -110 (Connection timed out) 
    2023-03-19  1:28:21 0 [Warning] WSREP: Failed to report last committed 6caf7137-be5b-11ed-952d-8e2998c314a7:61160171, -110 (Connection timed out)

My wsrep config:

    [galera]
    wsrep_on=ON
    wsrep_provider=/usr/lib/libgalera_smm.so                      
    wsrep_cluster_address="gcomm://xx.xx.xx.xxx,xxx.xx.xx.xxx,xx.xxx.xx.xxx,xx.xxx.xxx.xx"
    wsrep_sst_method=mariabackup
    wsrep_sst_auth=xxxxxxxxx:xxxxxxxxx
    binlog_format=row
    default_storage_engine=InnoDB
    innodb_autoinc_lock_mode=2
    wsrep_cluster_name="xxxxxxx"
    wsrep_node_address="xx.xxx.xxx.xx"
    wsrep_node_name="DBX"

What is causing this performance issue and how to fix it? Help me please.

uzurik
  • 31
  • 3

0 Answers0