HAProxy but still single point of failure

Question

I'm setting up a test cluster - Maria Galera Cluster on 3 physical nodes with HAProxy. Its working, but I've made some kind of beginner error which I can't seem to resolve - so hopefully someone can cast an expert eye and help me out please?!

I have 3 physical nodes Node1: 10.1.1.120

Node2: 10.1.1.121

Node3: 10.1.1.124

Using HAProxy a virtual IP of 10.1.1.113

Up and running, when I query via the virtual IP I get...

$ mysql -uroot -pPassword -P 3306 -h 10.1.1.113 -e "select @@hostname; show processlist;"
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
+----+-------------+-------------+------+---------+------+--------------------+------------------+----------+
| Id | User        | Host        | db   | Command | Time | State              | Info                 | Progress |
+----+-------------+-------------+------+---------+------+--------------------+------------------+----------+
|  1 | system user |             | NULL | Sleep   |   37 | NULL               | NULL                 |    0.000 |
|  2 | system user |             | NULL | Sleep   |   37 | wsrep aborter idle | NULL             |    0.000 |
| 45 | root        | node1:55877 | NULL | Query   |    0 | init               | show processlist |    0.000 |
+----+-------------+-------------+------+---------+------+--------------------+-

And if I do ip a on node1 - that is indeed where my virtual IP address is living BUT the hostname comes back as node2.

If I shutdown (or just disable eth0) on node1, the virtual IP address shifts elsewhere, but the @@hostname still comes back as node2.

The problem comes if I shutdown node2, then when I try to mysql using the virtual IP I get:

**ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0 "Internal error/check (Not system error)"**

(at this point if I log in to any of the local machines without using the virtual IP it will work).

So, it seems as if the HAProxy part is working (as this moves around appropriately), but MariaDB is trying to do its own thing and has decided everything needs to be routed via Node2.

I have no bind-address in my .cnf files. I'm using port 1306 for my sql service to avoid any conflicts with 3306 when restarting the service on a machine that happens to have the virtual IP and post 3306 at the same time.

My keepalived file is... (not sure if this is correct, but all nodes are set to master and priorities are 100,101 and 102 respectively -- doesn't seem to make a difference)

global_defs {
  router_id geordi
}
vrrp_script haproxy {
  script "killall -0 haproxy"
  interval 1
  weight 1
}

vrrp_instance 51 {
  virtual_router_id 51
  priority 101
  state MASTER
  interface eth0
  virtual_ipaddress {
    10.1.1.113 dev eth0
  }
  track_script {
    haproxy
  }
}

and my haproxy.cfg is:

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    user haproxy
    group haproxy
    daemon

defaults
    log global
    mode    http
    option  dontlognull
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

listen mysql_proxy 10.1.1.113:3306
        mode tcp 
        balance roundrobin
        option tcpka 
        option httpchk
        option mysql-check user haproxy
        server node1 10.1.1.120:1306 check 
        server node2 10.1.1.121:1306 check 
        server node3 10.1.1.124:1306 check

Any suggestions gratefully received please -- I'm frustratingly close to getting this all working, just not quite there!

Can you bind it to the virtual IP? I think I had to do that for a heartbeat based cluster I setup awhile ago. — RyanH, Feb 23 '15 at 01:05
Sorry - bind what to the virtual IP? The virtual IP was bound (I'm a bit iffy here) to one node at a time, but this never seemed to be the node that was being displayed by @@hostname. — user3566845, Feb 23 '15 at 17:34
Nevermind, I thought it was related to MySQL. There's a few slight differences between your config and the one posted here, may be worth trying: http://www.fromdual.com/making-haproxy-high-available-for-mysql-galera-cluster — RyanH, Feb 23 '15 at 21:05

score 1 · Accepted Answer · answered Jul 04 '15 at 19:34

Explicitly set bind-address=0.0.0.0 in my.cnf.

Additionally (you've probably already done this if you are this far):

ensure each host has the ip address 10.1.1.113 (if using keepalived, then via a dummy interface as /32).
set net.ipv4.conf.default.rp_filter = 2 in /etc/sysctl.conf
set net.ipv4.conf.default.accept_source_route = 0 in /etc/sysctl.conf

This permits MySQL to LISTEN on all interfaces, and permits MySQL to RESPOND on an interface that the packet was not for.

The network interface on node1 (10.1.1.120) gets the packet as "10.1.1.13" on interface corresponding to 10.1.1.120. Normally this would be dropped saying "that's not for me". This is happening on the "Internet" layer of the TCP/IP model.

However, provisions 2 and 3 above says "just accept it, it may be for us", which then passes to the MySQL ("Application" layer of the TCP/IP model). MySQL sees that we're bound to all addresses, one of them is 10.1.1.113 (provision 1), and processes it.

HAProxy but still single point of failure

1 Answers1