1

We have three nodes in Galera cluster with IPs 172.21.100.23, 172.21.100.24 and 172.21.100.25. The cluster had run successfully for a year.

Recently I changed my.cnf on 23 to add some bin log configurations. After that, I restarted mysql on 23,the start command 'systemctl start mysqld' suspended, and it can not be stopped using 'systemctl stop mysqld'. So I found out mysql process id and killed it. After that I removed the added configurations in my.cnf on 23, restoring it to the state before, but I still can not start mysql successfully.

I had looked into grastate.dat in datadir, it has following content:

version 2.1
UUID: 00000000-0000-0000-0000-000000000000
seqno:-1
safe_to_bootstrap:0

I had run 'mysqld --wsrep-recover' on 23, it executed successfully, but it had no effect. In mysqld.log, there are some lines generated after executing 'mysqld --wsrep-recover',

Found saved state:00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap:0

it is the same as in grastate.dat.

Following is the my.cnf on 23:

[mysqld]
datadir=/data/g_mysql_data/data
socket=/var/lib/mysql/mysql.sock
user=mysql
#bind-address=172.21.100.23
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=2000M
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=5000M; gcache.page_size=300M"
wsrep_cluster_name="galera_cluster1"
wsrep_cluster_address="gcomm://172.21.100.23,172.21.100.24,172.21.100.25"
wsrep_sst_method=rsync
server_id=1
wsrep_node_address="172.21.100.23"
wsrep_node_name="gcm1"
log-error=/data/g_mysql_log/mysqld.log
pid-file=/data/g_mysql_log/mysqld.pid
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
log_output=TABLE
max_connections=500
group_concat_max_len=102400
innodb_log_file_size=1024M
innodb_strict_mode=0
[mysql_safe]

Following is mysqld.log:

2022-01-13T03:42:22.176244Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 254788306500
2022-01-13T03:42:22.176267Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 254788308116
2022-01-13T03:42:22.176616Z 0 [Note] InnoDB: Database was not shutdown normally!
2022-01-13T03:42:22.176622Z 0 [Note] InnoDB: Starting crash recovery.
2022-01-13T03:42:22.190460Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 03:42:22 UTC - mysqld got signal 11 ;
2022-01-13 11:42:22 0x7ff8f1da6700This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
  InnoDB: Assertion failure in thread 140707186239232 in file log0recv.cc line 1930
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
key_buffer_size=8388608
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
read_buffer_size=131072
max_used_connections=0
max_threads=500
thread_count=0
connection_count=0
Fatal signal 6 while backtracing
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 207164 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2022-01-13T03:42:22.559205Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-01-13T03:42:22.560595Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.30) starting as process 229812 ...
2022-01-13T03:42:22.563894Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2022-01-13T03:42:22.563910Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
2022-01-13T03:42:22.564682Z 0 [Note] WSREP: wsrep_load(): Galera 3.30(r4e1a604) by Codership Oy <info@codership.com> loaded successfully.
2022-01-13T03:42:22.564704Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2022-01-13T03:42:22.565018Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2022-01-13T03:42:22.566424Z 0 [Note] WSREP: Passing config to GCS: base_dir = /data/g_mysql_data/data/; base_host = 172.21.100.23; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/g_mysql_data/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/g_mysql_data/data//galera.cache; gcache.page_size = 300M; gcache.recover = no; gcache.size = 5000M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0
2022-01-13T03:42:22.589530Z 0 [Note] WSREP: GCache history reset: 60a4edc4-d099-11ea-ba22-c3903525a426:0 -> 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:22.589730Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2022-01-13T03:42:22.589767Z 0 [Note] WSREP: wsrep_sst_grab()
2022-01-13T03:42:22.589776Z 0 [Note] WSREP: Start replication
2022-01-13T03:42:22.589800Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:22.589962Z 0 [Note] WSREP: protonet asio version 0
2022-01-13T03:42:22.590183Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-01-13T03:42:22.590241Z 0 [Note] WSREP: backend: asio
2022-01-13T03:42:22.590325Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2022-01-13T03:42:22.590497Z 0 [Warning] WSREP: access file(/data/g_mysql_data/data//gvwstate.dat) failed(No such file or directory)
2022-01-13T03:42:22.590510Z 0 [Note] WSREP: restore pc from disk failed
2022-01-13T03:42:22.590940Z 0 [Note] WSREP: GMCast version 0
2022-01-13T03:42:22.591482Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-01-13T03:42:22.591499Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-01-13T03:42:22.592211Z 0 [Note] WSREP: EVS version 0
2022-01-13T03:42:22.592444Z 0 [Note] WSREP: gcomm: connecting to group 'galera_cluster1', peer '172.21.100.23:,172.21.100.24:,172.21.100.25:'
2022-01-13T03:42:22.593730Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://172.21.100.23:4567
2022-01-13T03:42:22.594410Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') connection established to 1c9e42c9 tcp://172.21.100.24:4567
2022-01-13T03:42:22.594581Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2022-01-13T03:42:22.594628Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') connection established to 2bc6a038 tcp://172.21.100.25:4567
2022-01-13T03:42:23.095574Z 0 [Note] WSREP: declaring 1c9e42c9 at tcp://172.21.100.24:4567 stable
2022-01-13T03:42:23.095627Z 0 [Note] WSREP: declaring 2bc6a038 at tcp://172.21.100.25:4567 stable
2022-01-13T03:42:23.096046Z 0 [Note] WSREP: Node 1c9e42c9 state prim
2022-01-13T03:42:23.096446Z 0 [Note] WSREP: view(view_id(PRIM,1c9e42c9,105) memb {
    1c9e42c9,0
    2bc6a038,0
    d101bfe8,0
} joined {
} left {
} partitioned {
})
2022-01-13T03:42:23.096484Z 0 [Note] WSREP: save pc into disk
2022-01-13T03:42:23.595866Z 0 [Note] WSREP: gcomm: connected
2022-01-13T03:42:23.595922Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2022-01-13T03:42:23.595999Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2022-01-13T03:42:23.596010Z 0 [Note] WSREP: Opened channel 'galera_cluster1'
2022-01-13T03:42:23.596154Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2022-01-13T03:42:23.596228Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2022-01-13T03:42:23.596239Z 0 [Note] WSREP: Waiting for SST to complete.
2022-01-13T03:42:23.596294Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d1514dd1-7422-11ec-a8c5-9ad193910923
2022-01-13T03:42:23.596312Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 0 (gcm2)
2022-01-13T03:42:23.596338Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 1 (gcm3)
2022-01-13T03:42:23.596883Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d1514dd1-7422-11ec-a8c5-9ad193910923 from 2 (gcm1)
2022-01-13T03:42:23.596907Z 0 [Note] WSREP: Quorum results:
    version    = 6,
    component  = PRIMARY,
    conf_id    = 104,
    members    = 2/3 (joined/total),
    act_id     = 257246765,
    last_appl. = -1,
    protocols  = 0/9/3 (gcs/repl/appl),
    group UUID = 60a4edc4-d099-11ea-ba22-c3903525a426
2022-01-13T03:42:23.596922Z 0 [Note] WSREP: Flow-control interval: [28, 28]
2022-01-13T03:42:23.596933Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 257246765)
2022-01-13T03:42:23.597050Z 2 [Note] WSREP: State transfer required: 
    Group state: 60a4edc4-d099-11ea-ba22-c3903525a426:257246765
    Local state: 00000000-0000-0000-0000-000000000000:-1
2022-01-13T03:42:23.597103Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-01-13T03:42:23.597130Z 2 [Note] WSREP: New cluster view: global state: 60a4edc4-d099-11ea-ba22-c3903525a426:257246765, view# 105: Primary, number of nodes: 3, my index: 2, protocol version 3
2022-01-13T03:42:23.597140Z 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-01-13T03:42:23.597287Z 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.21.100.23' --datadir '/data/g_mysql_data/data/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '229812'  '' '
2022-01-13T03:42:23.740222Z 2 [Note] WSREP: Prepared SST request: rsync|172.21.100.23:4444/rsync_sst
2022-01-13T03:42:23.740273Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-01-13T03:42:23.740313Z 2 [Note] WSREP: Assign initial position for certification: 257246765, protocol version: 4
2022-01-13T03:42:23.740360Z 0 [Note] WSREP: Service thread queue flushed.
2022-01-13T03:42:23.740490Z 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (60a4edc4-d099-11ea-ba22-c3903525a426): 1 (Operation not permitted)
     at galera/src/replicator_str.cpp:prepare_for_IST():463. IST will be unavailable.
2022-01-13T03:42:23.741249Z 0 [Note] WSREP: Member 2.0 (gcm1) requested state transfer from '*any*'. Selected 0.0 (gcm2)(SYNCED) as donor.
2022-01-13T03:42:23.741287Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 257246766)
2022-01-13T03:42:23.741323Z 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-01-13T03:42:23.741352Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 60a4edc4-d099-11ea-ba22-c3903525a426:257246765
2022-01-13T03:42:25.598167Z 0 [Note] WSREP: (d101bfe8, 'tcp://0.0.0.0:4567') turning message relay requesting off

Any help will be greatly appreciated.

Roy
  • 11
  • 2

0 Answers0