Galera mysql node joined the cluster unsuccessfully after being abnormally stopped

Question

We have three nodes in Galera cluster with IPs 172.21.100.23, 172.21.100.24 and 172.21.100.25. The cluster had run successfully for a year.

Recently I changed my.cnf on 23 to add some bin log configurations. After that, I restarted mysql on 23,the start command 'systemctl start mysqld' suspended, and it can not be stopped using 'systemctl stop mysqld'. So I found out mysql process id and killed it. After that I removed the added configurations in my.cnf on 23, restoring it to the state before, but I still can not start mysql successfully.

I had looked into grastate.dat in datadir, it has following content:

version 2.1
UUID: 00000000-0000-0000-0000-000000000000
seqno:-1
safe_to_bootstrap:0

I had run 'mysqld --wsrep-recover' on 23, it executed successfully, but it had no effect. In mysqld.log, there are some lines generated after executing 'mysqld --wsrep-recover',

Found saved state:00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap:0

it is the same as in grastate.dat.

Following is the my.cnf on 23:

[mysqld]
datadir=/data/g_mysql_data/data
socket=/var/lib/mysql/mysql.sock
user=mysql
#bind-address=172.21.100.23
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=2000M
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=5000M; gcache.page_size=300M"
wsrep_cluster_name="galera_cluster1"
wsrep_cluster_address="gcomm://172.21.100.23,172.21.100.24,172.21.100.25"
wsrep_sst_method=rsync
server_id=1
wsrep_node_address="172.21.100.23"
wsrep_node_name="gcm1"
log-error=/data/g_mysql_log/mysqld.log
pid-file=/data/g_mysql_log/mysqld.pid
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
log_output=TABLE
max_connections=500
group_concat_max_len=102400
innodb_log_file_size=1024M
innodb_strict_mode=0
[mysql_safe]

Following is mysqld.log:

2022-01-13T03:42:22.176244Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 254788306500
2022-01-13T03:42:22.176267Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 254788308116
2022-01-13T03:42:22.176616Z 0 [Note] InnoDB: Database was not shutdown normally!
2022-01-13T03:42:22.176622Z 0 [Note] InnoDB: Starting crash recovery.
2022-01-13T03:42:22.190460Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 03:42:22 UTC - mysqld got signal 11 ;
2022-01-13 11:42:22 0x7ff8f1da6700This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
  InnoDB: Assertion failure in thread 140707186239232 in file log0recv.cc line 1930
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
key_buffer_size=8388608
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
read_buffer_size=131072
max_used_connections=0
max_threads=500
thread_count=0
connection_count=0
Fatal signal 6 while backtracing
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 207164 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2022-01-13T03:42:22.559205Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-01-13T03:42:22.560595Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.30) starting as process 229812 ...
2022-01-13T03:42:22.563894Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2022-01-13T03:42:22.563910Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
2022-01-13T03:42:22.564682Z 0 [Note] WSREP: wsrep_load(): Galera 3.30(r4e1a604) by Codership Oy <info@codership.com> loaded successfully.
2022-01-13T03:42:22.564704Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2022-01-13T03:42:22.565018Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2022-01-13T03:42:22.566424Z 0 [Note] WSREP: Passing config to GCS: base_dir = /data/g_mysql_data/data/; base_host = 172.21.100.23; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/g_mysql_data/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/g_mysql_data/data//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 5000M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0
2022-01-13T03:42:22.589530Z 0 [Note] WSREP: GCache history reset: 60a4edc4-d099-11ea-ba22-c3903525a426:0 -> 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:22.589730Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2022-01-13T03:42:22.589767Z 0 [Note] WSREP: wsrep_sst_grab()
2022-01-13T03:42:22.589776Z 0 [Note] WSREP: Start replication
2022-01-13T03:42:22.589800Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:22.589962Z 0 [Note] WSREP: protonet asio version 0
2022-01-13T03:42:22.590183Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-01-13T03:42:22.590241Z 0 [Note] WSREP: backend: asio
2022-01-13T03:42:22.590325Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2022-01-13T03:42:22.590497Z 0 [Warning] WSREP: access file(/data/g_mysql_data/data//gvwstate.dat) failed(No such file or directory)
2022-01-13T03:42:22.590510Z 0 [Note] WSREP: restore pc from disk failed
2022-01-13T03:42:22.590940Z 0 [Note] WSREP: GMCast version 0
2022-01-13T03:42:22.591482Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-01-13T03:42:22.591499Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-01-13T03:42:22.592211Z 0 [Note] WSREP: EVS version 0
2022-01-13T03:42:22.592444Z 0 [Note] WSREP: gcomm: connecting to group 'galera_cluster1', peer '172.21.100.23:,172.21.100.24:,172.21.100.25:'
2022-01-13T03:42:22.593730Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://172.21.100.23:4567
2022-01-13T03:42:22.594410Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') connection established to 1c9e42c9 tcp://172.21.100.24:4567
2022-01-13T03:42:22.594581Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2022-01-13T03:42:22.594628Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') connection established to 2bc6a038 tcp://172.21.100.25:4567
2022-01-13T03:42:23.095574Z 0 [Note] WSREP: declaring 1c9e42c9 at tcp://172.21.100.24:4567 stable
2022-01-13T03:42:23.095627Z 0 [Note] WSREP: declaring 2bc6a038 at tcp://172.21.100.25:4567 stable
2022-01-13T03:42:23.096046Z 0 [Note] WSREP: Node 1c9e42c9 state prim
2022-01-13T03:42:23.096446Z 0 [Note] WSREP: view(view_id(PRIM,1c9e42c9,105) memb {
    1c9e42c9,0
    2bc6a038,0
    d101bfe8,0
} joined {
} left {
} partitioned {
})
2022-01-13T03:42:23.096484Z 0 [Note] WSREP: save pc into disk
2022-01-13T03:42:23.595866Z 0 [Note] WSREP: gcomm: connected
2022-01-13T03:42:23.595922Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2022-01-13T03:42:23.595999Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2022-01-13T03:42:23.596010Z 0 [Note] WSREP: Opened channel 'galera_cluster1'
2022-01-13T03:42:23.596154Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2022-01-13T03:42:23.596228Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2022-01-13T03:42:23.596239Z 0 [Note] WSREP: Waiting for SST to complete.
2022-01-13T03:42:23.596294Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d1514dd1-7422-11ec-a8c5-9ad193910923
2022-01-13T03:42:23.596312Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 0 (gcm2)
2022-01-13T03:42:23.596338Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 1 (gcm3)
2022-01-13T03:42:23.596883Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 2 (gcm1)
2022-01-13T03:42:23.596907Z 0 [Note] WSREP: Quorum results:
    version    = 6,
    component  = PRIMARY,
    conf_id    = 104,
    members    = 2/3 (joined/total),
    act_id     = 257246765,
    last_appl. = -1,
    protocols  = 0/9/3 (gcs/repl/appl),
    group UUID = 60a4edc4-d099-11ea-ba22-c3903525a426
2022-01-13T03:42:23.596922Z 0 [Note] WSREP: Flow-control interval: [28, 28]
2022-01-13T03:42:23.596933Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 257246765)
2022-01-13T03:42:23.597050Z 2 [Note] WSREP: State transfer required: 
    Group state: 60a4edc4-d099-11ea-ba22-c3903525a426:257246765
    Local state: 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:23.597103Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-01-13T03:42:23.597130Z 2 [Note] WSREP: New cluster view: global state: 60a4edc4-d099-11ea-ba22-c3903525a426:257246765, view# 105: Primary, number of nodes: 3, my index: 2, protocol version 3
2022-01-13T03:42:23.597140Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-01-13T03:42:23.597287Z 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.21.100.23' --datadir '/data/g_mysql_data/data/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '229812'  '' '
2022-01-13T03:42:23.740222Z 2 [Note] WSREP: Prepared SST request: rsync|172.21.100.23:4444/rsync_sst
2022-01-13T03:42:23.740273Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-01-13T03:42:23.740313Z 2 [Note] WSREP: Assign initial position for certification: 257246765, protocol version: 4
2022-01-13T03:42:23.740360Z 0 [Note] WSREP: Service thread queue flushed.
2022-01-13T03:42:23.740490Z 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (60a4edc4-d099-11ea-ba22-c3903525a426): 1 (Operation not permitted)
     at galera/src/replicator_str.cpp:prepare_for_IST():463. IST will be unavailable.
2022-01-13T03:42:23.741249Z 0 [Note] WSREP: Member 2.0 (gcm1) requested state transfer from '*any*'. Selected 0.0 (gcm2)(SYNCED) as donor.
2022-01-13T03:42:23.741287Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 257246766)
2022-01-13T03:42:23.741323Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-01-13T03:42:23.741352Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 60a4edc4-d099-11ea-ba22-c3903525a426:257246765
2022-01-13T03:42:25.598167Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') turning message relay requesting off

Any help will be greatly appreciated.

Galera mysql node joined the cluster unsuccessfully after being abnormally stopped

0 Answers0