haproxy cannot detect services after reboot

Question

I have reids nodes:

NAME                                            READY   STATUS    RESTARTS   AGE
pod/redis-haproxy-deployment-65497cd78d-659tq   1/1     Running   0          31m
pod/redis-sentinel-node-0                       3/3     Running   0          81m
pod/redis-sentinel-node-1                       3/3     Running   0          80m
pod/redis-sentinel-node-2                       3/3     Running   0          80m
pod/ubuntu                                      1/1     Running   0          85m

NAME                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)              AGE
service/redis-haproxy-balancer    ClusterIP   10.43.92.106   <none>        6379/TCP             31m
service/redis-sentinel-headless   ClusterIP   None           <none>        6379/TCP,26379/TCP   99m
service/redis-sentinel-metrics    ClusterIP   10.43.72.97    <none>        9121/TCP             99m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/redis-haproxy-deployment   1/1     1            1           31m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/redis-haproxy-deployment-65497cd78d   1         1         1       31m

NAME                                   READY   AGE
statefulset.apps/redis-sentinel-node   3/3     99m

I connect to the master redis using the following command:

redis-cli -h redis-haproxy-balancer

redis-haproxy-balancer:6379> keys *
1) "sdf"
2) "sdf12"
3) "s4df12"
4) "s4df1"
5) "fsafsdf"
6) "!s4d!1"
7) "s4d!1"

Here is my configuration file haproxy.cfg:

global
  daemon
  maxconn 256


defaults REDIS
  mode tcp
  timeout connect 3s
  timeout server 3s
  timeout client 3s


frontend front_redis
  bind 0.0.0.0:6379
  use_backend redis_cluster


backend redis_cluster
  mode tcp
  option tcp-check
  tcp-check comment PING\ phase
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check comment role\ check
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  tcp-check comment QUIT\ phase
  tcp-check send QUIT\r\n
  tcp-check expect string +OK     

  server redis-0 redis-sentinel-node-0.redis-sentinel-headless:6379 maxconn 1024 check inter 1s 
  server redis-1 redis-sentinel-node-1.redis-sentinel-headless:6379 maxconn 1024 check inter 1s 
  server redis-2 redis-sentinel-node-2.redis-sentinel-headless:6379 maxconn 1024 check inter 1s

Here is the service I go to in order to get to the master redis - haproxy-service.yaml:

apiVersion: v1 
kind: Service 
metadata: 
  name: redis-haproxy-balancer
spec: 
  type: ClusterIP
  selector: 
    app: redis-haproxy 
  ports: 
  - protocol: TCP
    port: 6379 
    targetPort: 6379

here is a deployment that refers to a configuration file - redis-haproxy-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-haproxy-deployment
  labels:
    app: redis-haproxy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-haproxy
  template:
    metadata:
      labels:
        app: redis-haproxy
    spec:
      containers:
      - name: redis-haproxy
        image: haproxy:lts-alpine
        volumeMounts:
        - name: redis-haproxy-config-volume
          mountPath: /usr/local/etc/haproxy/haproxy.cfg
          subPath: haproxy.cfg 
        ports:
        - containerPort: 6379
      volumes:
      - name: redis-haproxy-config-volume
        configMap:
          name: redis-haproxy-config
          items:
          - key: haproxy.cfg
            path: haproxy.cfg

After restarting redis I cannot connect to it with redis-haproxy-balancer...

[NOTICE]   (1) : New worker (8) forked
[NOTICE]   (1) : Loading success.
[WARNING]  (8) : Server redis_cluster/redis-0 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server redis_cluster/redis-1 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1005ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server redis_cluster/redis-2 is DOWN, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'redis_cluster' has no server available!

It only works by connecting directly: redis-sentinel-node-0.redis-sentinel-headless

What is wrong with my haproxy?

I think the problem is that haproxy caching ip addresses. After deleting the pods, the ip addresses change to, but haproxy doesn't see them. — cevoye, Dec 15 '22 at 13:13

score 1 · Accepted Answer · answered Dec 17 '22 at 16:36

You will need to add a resolver section and point it to the kubernetes dns.

Kubernetes: DNS for Services and Pods
HAProxy: Server IP address resolution using DNS

resolvers mydns
nameserver dns1 Kubernetes-DNS-Service-ip:53

resolve_retries       3
timeout resolve       1s
timeout retry         1s
hold other           30s
hold refused         30s
hold nx              30s
hold timeout         30s
hold valid           10s
hold obsolete        30s

backend redis_cluster
  mode tcp
  option tcp-check
  ... # your other settings
  
  server redis-0 redis-sentinel-node-0.redis-sentinel-headless:6379 resolvers mydns maxconn 1024 check inter 1s
  server redis-1 redis-sentinel-node-1.redis-sentinel-headless:6379 resolvers mydns maxconn 1024 check inter 1s

score 1 · Answer 2 · answered May 25 '23 at 13:48

The problem lies with HAProxy only resolving DNS names once, at startup. It becomes a problem when you re-create pods, as the IP address of the backend service may get changed. Unfortunately, there is no way to simply tell HAProxy to try resolving them again!

However, you can specify your own DNS resolver section with the "hold valid" option to determine how long it will keep the resolution results, together with the "parse-resolv-conf" option. Specify this resolver in your server line.

resolvers globaldnspolicy
  parse-resolv-conf
  hold valid 30s
  ...
listen myservice:
  bind *:8080 accept-proxy
  mode http
  server theserver theserver.mynamespace.svc.cluster.local check resolvers globaldnspolicy

"parse-resolv-conf" tells HAProxy to parse /etc/resolv.conf, so that you do not need to hardcode the DNS servers on your own. I found this more elegant, since we're using Kubernetes and/or do not have the IP addresses of the DNS services.

"hold valid" tells HAProxy to cache the results, not indefinitely. I do not have a specific reason to justify the 30-second number, but I thought it would be good to still have some caching to avoid hammering the DNS services.

Since we are using services in Kubernetes: There seems to be some difference in DNS resolution behaviour, once this is done; it would appear that we have to now resolve with the FQDN, otherwise we end up with NX domain errors. As documented for HAProxy: DNS names are by default, resolved with the libc functions (likely getHostByName etc), but HAProxy now makes queries with the DNS servers it learns about. I do not consider myself an expert, but the documentation about DNS for Services and Pods describes the expected behaviour and I think it explains what happened for me. So you need to enter the FQDN of your services, in order to keep things working.

haproxy cannot detect services after reboot

2 Answers2