0

I meet some troubles with initiliazing a Galera cluster, the start of 2nd node always failed, without error message in log. I have two nodes for the moment, i will install the third later. Here are my configuration node1 : 192.168.0.21 db01 node2 : 192.168.0.22 db02

both /etc/hosts are field with the hostnames entries. My galera.cnf looks like this on both nodes :

[mysqld]
#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="my_wsrep_cluster"
wsrep_cluster_address="gcomm://192.168.0.21,192.168.0.22"
wsrep_node_address="192.168.0.21" # 192.168.0.22 on db02
wsrep_node_name="db01" # db02 on db02 server
 wsrep_sst_method=rsync
log-error=/var/log/mysql/error.log

I can start service on db01 with this command no trouble : service mysql start --wsrep-new-cluster

But when i start db02 with service mysql start i have a failed message. And the service didn't listen on port 3306 on DB02. Here are my logs which said than db02 detect the cluster, detect db01 , and need to synchronize, but the synchronisation doesn't seems to start....

161009 13:33:02 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
161009 13:33:02 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.7VTkrM' --pid-file='/var/lib/mysql/db02-recover.pid'
161009 13:33:02 [Note] /usr/sbin/mysqld (mysqld 5.5.52-MariaDB-1~wheezy-wsrep) starting as process 14839 ...
161009 13:33:04 mysqld_safe WSREP: Recovered position f89d319e-8e08-11e6-8b25-ebe3bbf9c45b:2
161009 13:33:04 [Note] WSREP: wsrep_start_position var submitted: 'f89d319e-8e08-11e6-8b25-ebe3bbf9c45b:2'
161009 13:33:04 [Note] /usr/sbin/mysqld (mysqld 5.5.52-MariaDB-1~wheezy-wsrep) starting as process 14890 ...
161009 13:33:04 [Note] WSREP: Read nil XID from storage engines, skipping position init
161009 13:33:04 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
161009 13:33:04 [Note] WSREP: wsrep_load(): Galera 25.3.17(r3619) by Codership Oy <info@codership.com> loaded successfully.
161009 13:33:04 [Note] WSREP: CRC-32C: using hardware acceleration.
161009 13:33:04 [Note] WSREP: Found saved state: f89d319e-8e08-11e6-8b25-ebe3bbf9c45b:-1
161009 13:33:04 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.0.22; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false;
161009 13:33:05 [Note] WSREP: Service thread queue flushed.
161009 13:33:05 [Note] WSREP: Assign initial position for certification: 2, protocol version: -1
161009 13:33:05 [Note] WSREP: wsrep_sst_grab()
161009 13:33:05 [Note] WSREP: Start replication
161009 13:33:05 [Note] WSREP: Setting initial position to f89d319e-8e08-11e6-8b25-ebe3bbf9c45b:2
161009 13:33:05 [Note] WSREP: protonet asio version 0
161009 13:33:05 [Note] WSREP: Using CRC-32C for message checksums.
161009 13:33:05 [Note] WSREP: backend: asio
161009 13:33:05 [Note] WSREP: gcomm thread scheduling priority set to other:0
161009 13:33:05 [Note] WSREP: restore pc from disk successfully
161009 13:33:05 [Note] WSREP: GMCast version 0
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
161009 13:33:05 [Note] WSREP: EVS version 0
161009 13:33:05 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '192.168.0.21:,192.168.0.22:'
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') connection established to def7d829 tcp://192.168.0.22:4567
161009 13:33:05 [Warning] WSREP: (def7d829, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.22:4567' points to own listening address, blacklisting
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') connection established to def7d829 tcp://192.168.0.22:4567
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') connection established to 51e5ab0b tcp://192.168.0.21:4567
161009 13:33:05 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
161009 13:33:05 [Note] WSREP: declaring 51e5ab0b at tcp://192.168.0.21:4567 stable
161009 13:33:05 [Note] WSREP: re-bootstrapping prim from partitioned components
161009 13:33:05 [Note] WSREP: view(view_id(PRIM,51e5ab0b,19) memb {
        51e5ab0b,0
        def7d829,0
} joined {
} left {
} partitioned {
})
161009 13:33:05 [Note] WSREP: save pc into disk
161009 13:33:05 [Note] WSREP: clear restored view
161009 13:33:06 [Note] WSREP: gcomm: connected
161009 13:33:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
161009 13:33:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
161009 13:33:06 [Note] WSREP: Opened channel 'my_wsrep_cluster'
161009 13:33:06 [Note] WSREP: Waiting for SST to complete.
161009 13:33:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
161009 13:33:06 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
161009 13:33:06 [Note] WSREP: STATE EXCHANGE: sent state msg: 2528a615-8e14-11e6-9e93-ab2afde28393
161009 13:33:06 [Note] WSREP: STATE EXCHANGE: got state msg: 2528a615-8e14-11e6-9e93-ab2afde28393 from 0 (db01)
161009 13:33:06 [Note] WSREP: STATE EXCHANGE: got state msg: 2528a615-8e14-11e6-9e93-ab2afde28393 from 1 (db02)
161009 13:33:06 [Warning] WSREP: Quorum: No node with complete state:


        Version      : 4
        Flags        : 0x3
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : SYNCED
        Prim UUID    : e33a5f6b-8e12-11e6-9981-7b9a76958a99
        Prim  seqno  : 2
        First seqno  : -1
        Last  seqno  : 3
        Prim JOINED  : 1
        State UUID   : 2528a615-8e14-11e6-9e93-ab2afde28393
        Group UUID   : 20836ccf-8e06-11e6-adf3-5330826fa72d
        Name         : 'db01'
        Incoming addr: '192.168.0.21:3306'

        Version      : 4
        Flags        : 00
        Protocols    : 0 / 7 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : NON-PRIMARY
        Prim UUID    : 00000000-0000-0000-0000-000000000000
        Prim  seqno  : -1
        First seqno  : -1
        Last  seqno  : 2
        Prim JOINED  : 0
        State UUID   : 2528a615-8e14-11e6-9e93-ab2afde28393
        Group UUID   : f89d319e-8e08-11e6-8b25-ebe3bbf9c45b
        Name         : 'db02'
        Incoming addr: '192.168.0.22:3306'

161009 13:33:06 [Note] WSREP: Full re-merge of primary e33a5f6b-8e12-11e6-9981-7b9a76958a99 found: 1 of 1.
161009 13:33:06 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 2,
        members    = 1/2 (joined/total),
        act_id     = 3,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 20836ccf-8e06-11e6-adf3-5330826fa72d
161009 13:33:06 [Note] WSREP: Flow-control interval: [23, 23]
161009 13:33:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3)
161009 13:33:06 [Note] WSREP: State transfer required:
        Group state: 20836ccf-8e06-11e6-adf3-5330826fa72d:3
        Local state: f89d319e-8e08-11e6-8b25-ebe3bbf9c45b:2
161009 13:33:06 [Note] WSREP: New cluster view: global state: 20836ccf-8e06-11e6-adf3-5330826fa72d:3, view# 3: Primary, number of nodes: 2, my index: 1, protocol version 3
161009 13:33:06 [Warning] WSREP: Gap in state sequence. Need state transfer.
161009 13:33:06 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.0.22' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '14890''
161009 13:33:08 [Note] WSREP: (def7d829, 'tcp://0.0.0.0:4567') turning message relay requesting off

All networks communication seems OK. I've disable firewall for making my test. No selinux too.

After try to start node2, i have this connections on db01 :

tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      7584/mysqld
tcp        0      0 0.0.0.0:4567            0.0.0.0:*               LISTEN      7584/mysqld
tcp        0      0 192.168.0.21:4567       192.168.0.22:53335      ESTABLISHED 7584/mysqld

I don't know if the rsync service should already listen on the node1, and maybe is why my 2nd node can't sync with the cluster.

So what's my missed up there ? Any errors in my configuration ?

P.S : It's the first time than i'm trying to install this. I followed official man page to install this : https://mariadb.org/installing-mariadb-galera-cluster-on-debian-ubuntu/

Kristy
  • 1
  • 1

1 Answers1

1

You need to bootstrap the cluster first with either node having the following cluster option:

wsrep_cluster_address="gcomm://"

So, start one node with that configuration, then start the other node with your setup above. Then take first node down and put the above configuration back in the first node.

Tero Kilkanen
  • 36,796
  • 3
  • 41
  • 63