Setup Introduction: I have a node js app with 3 different services namely admin, client and server. All these 3 services are running as individual docker containers. My setup consists of 2 EC2 instances behind an AWS Application Load Balancer, with each EC2 instance running 1 container each of the admin and client service and the server service scaled to 2 containers using docker-compose --scale option. I'm using containerised nginx as a reverse proxy and load balancer. I have a target group with both the instances as registered targets.
Problem description: The admin service needs to communicate with the server service via WebSocket and I'm using socket.io for that purpose. So this scenario requires sticky session to establish WebSocket connection. I have enabled sticky session at the instance level with nginx ip_hash in the upstream block for server service. At the ALB level I've enabled sticky session for the target group with the Load balancer generated cookie type. When I access the endpoint for the admin service via Chrome browser and use the inspect element, I can see that the WebSocket connection failed to establish with the error exactly being:
WebSocket connection to '<URL>' failed: WebSocket is closed before the connection is establisbed.
Failed to load resource: the server responded with the status of 400 ()
This is my nginx conf for the server service:
upstream webinar_server {
hash $remote_addr consistent;
server webinar-server_webinar_server_1:8000;
server webinar-server_webinar_server_2:8000;
}
server {
listen 80;
server_name server.mydomain.com;
location / {
proxy_pass http://webinar_server/;
proxy_set_header X-Real_IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_buffering off;
}
}
This is the nignx conf for admin service:
server {
listen 80;
server_name admin.mydomain.com;
location / {
proxy_pass http://webinar_admin:5001;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_buffering off;
}
}
I've tried: I've tried to implement a simpler setup to test out the stickiness of the infrastructure which worked as expected. I had 2 EC2 instances behind AWS ALB and each instances running 2 basic containerised nginx web servers each serving a different html page. These web servers are behind a containerised nginx reverse load balancer as mentioned in my original setup. In this case both the instance level stickiness using nginx hash function and the alb level target group stickiness worked as expected.
For the original setup I'm trying to implement, when i removed one of the instance from the target group(only one registered target in the target group), the instance level nginx stickiness worked fine routing to the correct server container(since there are 2 server containers). But the target group level stickiness returns the error mentioned above.