2

I have deployed a Redis Cluster using Kubernetes. I am now attempting to use HAProxy to load balance. HAProxy is great for load balancing a redis cluster, IF you have static IPs. However, we don't have this when using kubernetes. While testing failover, Redis and Kubernetes handle election of a new master and deploying a new pod, respectively. However, kubernetes elects a new IP to the new pod. How can we inject this new IP into the HAProxy healthchecks and remove the old master IP?

I have the following setup.

  +----+ +----+ +----+ +----+
  | W1 | | W2 | | W3 | | W4 |   Web application servers
  +----+ +----+ +----+ +----+
   \     |   |     /
    \    |   |    /
     \   |   |   /
      +---------+
      | HAProxy |
      +---------+
       /   \      \
   +----+ +----+ +----+
   | P1 | | P2 | | P3 |          K8S pods = Redis + Sentinel
   +----+ +----+ +----+

Which is very similar to the setup described on the haproxy blog.

Mulloy
  • 101
  • 8
  • If you wanted addresses that didn't change, you had the solution in k8s all along: services. Just make one service per Redis pod, e.g. redis-0, redis-1, redis-2. – neverfox May 04 '17 at 20:37

1 Answers1

2

According to https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/redis it uses sentinel to manage the failover. This reduces the problem to the "normal" sentinel based solution.

In this case I would recommend running HAProxy in the same container as the Senrinels and using a simple sentinel script to update the HAProxy Config and issue a reload. A simple HAProxy Config which o ly talks to the master can easily be a simple search, replace, reload script.

Oh and don't use the HAProxy check in that blog post. It doesn't account for or detect split brain conditions. You could either go with a simple port check for availability, or write a custom check which queries each of the sentinels and only talks to the one with at least two sentinels reporting it as the master.

The Real Bill
  • 14,884
  • 8
  • 37
  • 39
  • So I attempted to rely on Sentinel to elect a new master in my cluster and provide HAProxy with this new routing information. However, the downtime associated with this solution is unacceptable. I'm experiencing at least a minute of downtime from a client perspective. I'm thinking of simplifying the architecture and removing Kubernetes form the equation for now. – Mulloy Mar 22 '15 at 18:47
  • The primary factor here is likely down-after-milliseconds's default of 30 seconds. I've run it successfully with a three second value. Going lower easily results in a lot of false negatives causing excessive failover. – The Real Bill Mar 23 '15 at 16:05