1

I have Django Channels (with Redis) served by Daphne, running behind Nginx ingress controller, proxying behind a LB, all setup in Kubernetes. The Websocket is upgraded and everything runs fine... for a few minutes. After between 5-15min (varies), my daphne logs (set in -v 2 to debug) show:

WARNING dropping connection to peer tcp4:10.2.0.163:43320 with abort=True: WebSocket ping timeout (peer did not respond with pong in time)

10.2.0.163 is the cluster IP address of my Nginx pod. Immediately after, Nginx logs the following:

[error] 39#39: *18644 recv() failed (104: Connection reset by peer) while proxying upgraded connection [... + client real IP]

After this, the websocket connection is getting wierd: the client can still send messages to the backend, but the same websocket connection in Django channels does not receive group messages anymore, as if the channel had unsubscribed from the group. I know my code works since everything runs smoothly until the error gets logged but I'm guessing there is a configuration error somewhere that causes the problem. I'm sadly all out of ideas. Here is my nginx ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    acme.cert-manager.io/http01-edit-in-place: "true"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.org/websocket-services: "daphne-svc"
  name: ingress
  namespace: default
spec:
  tls:
  - hosts:
    - mydomain
    secretName: letsencrypt-secret
  rules:
    - host: mydomain
      http:
        paths:
          - path: /
            backend:
              service:
                name: uwsgi-svc
                port:
                  number: 80            
            pathType: Prefix
          - path: /ws
            backend:
              service:
                name: daphne-svc
                port:
                  number: 80            
            pathType: Prefix 

Configured according to this and this. Installation with helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ngingress ingress-nginx/ingress-nginx

Here is my Django Channels consumer:

class ChatConsumer(AsyncWebsocketConsumer):

 
    async def connect(self):
        user = self.scope['user']
        if user.is_authenticated: 
            self.inbox_group_name = "inbox-%s" % user.id


            device = self.scope.get('device', None)
            added = False
            if device:
                added = await register_active_device(user, device)
                
            if added:
                # Join inbox group
                await self.channel_layer.group_add(
                    self.inbox_group_name,
                    self.channel_name
                )

                
                await self.accept()
            else:
                await self.close()
        else:
            await self.close()

    
    async def disconnect(self, close_code):
        user = self.scope['user']
        device = self.scope.get('device', None)
        if device:
            await unregister_active_device(user, device)
        # Leave room group
        if hasattr(self, 'inbox_group_name'):
            await self.channel_layer.group_discard(
                self.inbox_group_name,
                self.channel_name
            )
            
        

    """
    Receive message from room group; forward it to client
    """
    async def group_message(self, event):
        message = event['message']
        
        # Send message to WebSocket 
        await self.send(text_data=json.dumps(message))
        

    async def forward_message_to_other_members(self, chat, message, notification_fallback=False):    

        user = self.scope['user']
        other_members = await get_other_chat_members(chat, user)                
        for member in other_members:
            if member.active_devices_count > 0:
                #this will send the message to the user inbox; each consumer will handle it with the group_message method
                await self.channel_layer.group_send(
                    member.inbox.group_name,
                    {
                        'type': 'group_message',
                        'message': message
                    }
                )
            else:
                #no connection for this user, send a notification instead
                if notification_fallback:
                    await ChatNotificationHandler().send_chat_notification(chat, message, recipient=member, author=user)  
mrj
  • 589
  • 1
  • 7
  • 17
  • Which nginx ingress contoller did you deploy? Can you pass me a link? – Matt May 28 '21 at 09:12
  • I edited my post to indicate installation method. Version shows: NGINX Ingress controller Release: v0.46.0 Build: 6348dde672588d5495f70ec77257c230dc8da134 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.6 – mrj May 28 '21 at 09:17
  • Your ingress controller is [this one](https://github.com/kubernetes/ingress-nginx) and your annotation (`nginx.org/websocket-services`) is for [this one](https://github.com/nginxinc/kubernetes-ingress). Yes these are two different controllers. But the good thing is that the one you deployed comes with [websocket](https://kubernetes.github.io/ingress-nginx/user-guide/miscellaneous/#websockets) support out of the box, so you dont need this annotation. – Matt May 28 '21 at 09:48
  • Thanks for your input. This means that I should remove the useless `nginx.org/websocket-services` statement but also that my problem remains :) – mrj May 28 '21 at 09:51
  • "but also that my problem remains" - I know. Can you try to bypass ingress; simplify the deployment? If the problem disappears you will know with high certainty that the issue is with ingress. If the issue will persist then the issue is somewhere else. – Matt May 28 '21 at 09:54
  • Or use tcpdump and look for the rst package. Who is sending rst? is it client? is it nginx? is it server? – Matt May 28 '21 at 10:24
  • I like the idea. How do I tcpdump a kubernetes pod? – mrj May 28 '21 at 10:34
  • Try to use [ksniff](https://github.com/eldadru/ksniff) – Matt May 28 '21 at 10:40
  • Wireshark indicates that all RST packages are sent from my node's IP (i guess that means from the client) towards the Nginx pod. I suppose this means the RST package is sent from the client device? That's kinda hard to believe, the client would have no reason to close (my mobile application in constantly in the foreground). – mrj May 28 '21 at 11:53
  • Oh, so it's a mobile client? Are you on wifi or cellular? Maybe a mobile is jumping between the two and this causes rst. – Matt May 28 '21 at 11:57
  • I've tested on real device and emulator, which uses LAN network, so that can't be the cause. ATM I'm also checking my Redis database content to see if the information there is OK. – mrj May 28 '21 at 12:17

1 Answers1

2

I ended up adding a ping internal on the client and increasing nginx timeout to 1 day, which changed the problem but also shows it's probably not a nginx/daphne configuration problem.

mrj
  • 589
  • 1
  • 7
  • 17