MySQL cluster data node restarts when its twin fails

Question

Configuration:

Server #1: 1 mgm node (#49), 1 data node (#1), 1 sql node (Real IP 192.168.1.128)
Server #2: 1 mgm node (#50), 1 data node (#2), 1 sql node (Real IP 192.168.1.130)
Virtual IP: 192.168.1.240 (using keepalived, server #1 as master)

Specification:

MySQL Cluster 7.3.6 x86_64
Debian 7.6 x86_64

It's been deployed by using MySQL Cluster Auto-Installer. Every thing works just fine.
But, when I shutdown one node, the data node on the other server gets restarted. NDB_MGM shows that it is "starting". And it takes long to exit "starting" state.
As I tested it won't happen when there are four nodes.
Doe anyone know what the reason of this restarting is?
Thanks in advance.

Update: configuration files and command line parameters
1. Config file for NDB_MGMD #50

#
# Configuration file for MyCluster NDB_MGMD #49
# /usr/local/mysql/data/49/config.ini

[NDB_MGMD DEFAULT]
Portnumber=1186

[NDB_MGMD]
NodeId=49
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/49/
Portnumber=1186

[NDB_MGMD]
NodeId=50
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/50/
Portnumber=1186

[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M

[NDBD DEFAULT]
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M
BackupReportFrequency=10
MemReportFrequency=30
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
DataMemory=1M
IndexMemory=1M
MaxNoOfTables=4096
MaxNoOfTriggers=3500
NoOfReplicas=2
StringMemory=25
DiskPageBufferMemory=64M
SharedGlobalMemory=20M
LongMessageBuffer=32M
MaxNoOfConcurrentTransactions=16384
BatchSizePerLocalScan=512
FragmentLogFileSize=64M
NoOfFragmentLogFiles=16
RedoBuffer=32M
MaxNoOfExecutionThreads=2
StopOnError=false
LockPagesInMainMemory=1
TimeBetweenEpochsTimeout=32000
TimeBetweenWatchdogCheckInitial=60000
TransactionInactiveTimeout=60000
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000

[NDBD]
NodeId=1
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/1/

[NDBD]
NodeId=2
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/2/

[MYSQLD DEFAULT]

[MYSQLD]
NodeId=53
HostName=192.168.1.128

[MYSQLD]
NodeId=54
HostName=192.168.1.130

2. Config file for NDB_MGMD #50

#
# Configuration file for MyCluster NDB_MGMD #50
# /usr/local/mysql/data/50/config.ini

[NDB_MGMD DEFAULT]
Portnumber=1186

[NDB_MGMD]
NodeId=49
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/49/
Portnumber=1186

[NDB_MGMD]
NodeId=50
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/50/
Portnumber=1186

[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M

[NDBD DEFAULT]
BackupMaxWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M
BackupReportFrequency=10
MemReportFrequency=30
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
DataMemory=1M
IndexMemory=1M
MaxNoOfTables=4096
MaxNoOfTriggers=3500
NoOfReplicas=2
StringMemory=25
DiskPageBufferMemory=64M
SharedGlobalMemory=20M
LongMessageBuffer=32M
MaxNoOfConcurrentTransactions=16384
BatchSizePerLocalScan=512
FragmentLogFileSize=64M
NoOfFragmentLogFiles=16
RedoBuffer=32M
MaxNoOfExecutionThreads=2
StopOnError=false
LockPagesInMainMemory=1
TimeBetweenEpochsTimeout=32000
TimeBetweenWatchdogCheckInitial=60000
TransactionInactiveTimeout=60000
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000

[NDBD]
NodeId=1
HostName=192.168.1.128
DataDir=/usr/local/mysql/data/1/

[NDBD]
NodeId=2
HostName=192.168.1.130
DataDir=/usr/local/mysql/data/2/

[MYSQLD DEFAULT]

[MYSQLD]
NodeId=53
HostName=192.168.1.128

[MYSQLD]
NodeId=54
HostName=192.168.1.130

Command line parameters:
1. To start ndb_mgmd on server #1

/usr/local/mysql/bin/ndb_mgmd --initial --ndb-nodeid=49 \
--config-dir=/usr/local/mysql/data/49/ \
--config-file=/usr/local/mysql/data/49/config.ini

2. To start ndb_mgmd on server #2

/usr/local/mysql/bin/ndb_mgmd --initial --ndb-nodeid=50 \
--config-dir=/usr/local/mysql/data/50/ \
--config-file=/usr/local/mysql/data/50/config.ini

3. To start ndbmtd on server #1

/usr/local/mysql/bin/ndbmtd --ndb-nodeid=1 --bind-address=192.168.1.128 \
--ndb-connectstring=192.168.1.240:1186,

4. To start ndbmtd on server #2

/usr/local/mysql/bin/ndbmtd --ndb-nodeid=2 --bind-address=192.168.1.130 \
--ndb-connectstring=192.168.1.240:1186,

Raul Andres · Answer 1 · 2014-09-23T06:22:53.100

0

There is a problem when your two node settings. If you have a network problem (split brain condition), both nodes won't see each other, and then, they will decide to shutdown. Then, they will start but they will have to wait for the other node, unless "nowaitfornodes" is specified.

With 4 nodes, you're splitting your cluster 3/1, so the node that have network up will have enough quorum to validate it's mgm node as arbitrator, and will become master.

You should resolve this issue either placing the mgm node in a third machine (it's a really lightweigth process, so no special resources needed) or using a Cluster and binding mgm service to the VIP. If not, you will lost service on network failure of one of the nodes.

For the VIP config data nodes must be forced to use real IP:

--bind-address=name

And ArbitrationTimeout should be set high enough to allow cluster migrate mgm service

For the mgm node, disabling config cache will make configuration changes easier

--config-cache=FALSE

edited Sep 23 '14 at 06:22

answered Sep 19 '14 at 07:44

Raul Andres

3,766
15
24

Thanks for your answer. To get it clearly: which node do you mean? And by the way why won't it happen with four nodes? As I realized, the data node which is on the disconnected server (interface is down) gets down. But the other gets restarted. And again it won't happen with four nodes, even though mgm nodes are running on the same servers. By the way, I cannot change the design. – Rad Sep 19 '14 at 07:51
I tried to use "nowait-nodes" parameter without any success (for both types of nodes). It prevents only one node from being restarted. I found some information about "ArbitrationRank" for MGM node, which didn't help either. And also I tried to use VIP for two MGM nodes on servers. It won't work because the data node located on the same machine (which has MGM node running) cannot connect to the MGM node. The MGM node sees the data node communicating from the virtual IP, not the real IP. Do you have any Idea how it can be resolved? Thanks – Rad Sep 23 '14 at 02:43
We resolve the VIP issue forcing data node to use real IP, as edited – Raul Andres Sep 23 '14 at 06:23
It still keeps restarting. I added my configuration files and parameters. Would you please take a look at them. Thanks in advance. – Rad Sep 24 '14 at 02:04

MySQL cluster data node restarts when its twin fails

1 Answers1