I'm setting up a test cluster - Maria Galera Cluster on 3 physical nodes with HAProxy. Its working, but I've made some kind of beginner error which I can't seem to resolve - so hopefully someone can cast an expert eye and help me out please?!
I have 3 physical nodes Node1: 10.1.1.120
Node2: 10.1.1.121
Node3: 10.1.1.124
Using HAProxy a virtual IP of 10.1.1.113
Up and running, when I query via the virtual IP I get...
$ mysql -uroot -pPassword -P 3306 -h 10.1.1.113 -e "select @@hostname; show processlist;"
+------------+
| @@hostname |
+------------+
| node2 |
+------------+
+----+-------------+-------------+------+---------+------+--------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-------------+------+---------+------+--------------------+------------------+----------+
| 1 | system user | | NULL | Sleep | 37 | NULL | NULL | 0.000 |
| 2 | system user | | NULL | Sleep | 37 | wsrep aborter idle | NULL | 0.000 |
| 45 | root | node1:55877 | NULL | Query | 0 | init | show processlist | 0.000 |
+----+-------------+-------------+------+---------+------+--------------------+-
And if I do ip a on node1 - that is indeed where my virtual IP address is living BUT the hostname comes back as node2.
If I shutdown (or just disable eth0) on node1, the virtual IP address shifts elsewhere, but the @@hostname still comes back as node2.
The problem comes if I shutdown node2, then when I try to mysql using the virtual IP I get:
**ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 "Internal error/check (Not system error)"**
(at this point if I log in to any of the local machines without using the virtual IP it will work).
So, it seems as if the HAProxy part is working (as this moves around appropriately), but MariaDB is trying to do its own thing and has decided everything needs to be routed via Node2.
I have no bind-address in my .cnf files. I'm using port 1306 for my sql service to avoid any conflicts with 3306 when restarting the service on a machine that happens to have the virtual IP and post 3306 at the same time.
My keepalived file is... (not sure if this is correct, but all nodes are set to master and priorities are 100,101 and 102 respectively -- doesn't seem to make a difference)
global_defs {
router_id geordi
}
vrrp_script haproxy {
script "killall -0 haproxy"
interval 1
weight 1
}
vrrp_instance 51 {
virtual_router_id 51
priority 101
state MASTER
interface eth0
virtual_ipaddress {
10.1.1.113 dev eth0
}
track_script {
haproxy
}
}
and my haproxy.cfg is:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
defaults
log global
mode http
option dontlognull
contimeout 5000
clitimeout 50000
srvtimeout 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
listen mysql_proxy 10.1.1.113:3306
mode tcp
balance roundrobin
option tcpka
option httpchk
option mysql-check user haproxy
server node1 10.1.1.120:1306 check
server node2 10.1.1.121:1306 check
server node3 10.1.1.124:1306 check
Any suggestions gratefully received please -- I'm frustratingly close to getting this all working, just not quite there!