400 Handshake Error With Application Load Balancer AWS (Flask & Socket.io)

Question

Getting a 400 handshake error on POST requests to my Flask app running socket.io, but I've added in the configs for NGINX according to docs and posts I read online. I'm using an Application Load Balancer in AWS and have set a :80 Target Group and a :443 listener which forwards to the Target Group. I have also added a rule for the route /socket.io to forward request to the target group on :80 and have enabled sticky sessions within the target group. I'm also using a Route53 domain name and enforcing SSL everything works fine except the socket communication.

NGINX conf file:

server {

listen [::]:80;
listen 80;
server_name _domain_name_;
access_log  /var/log/nginx/access.log;
location / {
    proxy_pass http://127.0.0.1:8000;
    include proxy_params;
}
location /socket.io {
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    include proxy_params;
    proxy_http_version 1.1;
    proxy_buffering off;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_pass http://127.0.0.1:8000/socket.io;
}
}

And js file connection for socket.io:

var socket = io();
socket.on('connect', () => {
console.log(socket.connected); // true
});

Connection returns true.

Listener Rule

UPDATE

Switched to NLB and am still having the same issues, however now on my NGINX logs I am seeing

connect() failed (111: Connection refused) while connecting to upstream
request: "GET /socket.io/?EIO=3&transport=polling&t=MvDPJhb HTTP/1.1", 
upstream: "http://127.0.0.1:8000/socket.io/? 
EIO=3&transport=polling&t=MvDPJhb"

This probably isn't the answer you're looking for, but websockets have a bad rep with ALB, have you tried NLB? Enabling and looking at ALB logs may help as well. NGINX logs on the instance as well. — Simon, Nov 06 '19 at 02:31
I don't understand how using an NLB would work though? I mean a CLB would, but I need the extra capabilities an ALB provides... NGINX logs nothing out for this and access logs look fine? I haven't looked at the ALB logs though, I didn't set that up — Eder Maza, Nov 06 '19 at 17:36
Can you explain why CLB would? What extra capabilities of ALB do you need? Try and get some info from ALB. In general, NLB is more "stable" in terms of scaling events and traffic persistence than ALB. https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/ — Simon, Nov 07 '19 at 00:18
Because with CLB I would have support for TCP which I can configure to route my HTTP requests (from socket.io) through. Or at least that's what I've read. Also, I need the backend auth and the path based routing for future features, I don't think its necessary now, but I would like to setup everything through the ALB now if I can so I don't have to do extra work down the road. — Eder Maza, Nov 07 '19 at 00:49
Ah, I see. Well I think getting ALB logs and then going from there would be good. I've heard of people having massive issues using websockets through ALB because ALB will kill those connections. Setting the timeout may help as ALB will close connections if data isn't sent after a specified period of time. Have you seen this post? https://stackoverflow.com/questions/41381444/websocket-connection-failed-error-during-websocket-handshake-unexpected-respon — Simon, Nov 07 '19 at 01:06
Switched to NLB just to see if it changes anything and I'm still getting the same error. AWS also only allows for access logs to be viewable from the load balancers so when I enabled those, I didn't really get anything helpful back unfortunately... — Eder Maza, Nov 09 '19 at 00:23
Does your client code (within JavaScript that your Flask/Jinja code uses) do an explicit call to 127.0.0.1:8000? If so can you try 0.0.0.0:8000? And yes you must enable access logs :\ — Simon, Nov 09 '19 at 21:25
I get this response `GET https://0.0.0.0:8000/socket.io/?EIO=3&transport=polling&t=MvQufab` and polling-xhr.js:264 When I change in the javascript... `var socket = io('0.0.0.0:8000');` — Eder Maza, Nov 11 '19 at 14:36
Yes, I get this NGINX error `2019/11/11 20:44:44 [error] 3470#0: *53447 connect() failed (111: Connection refused) while connecting to upstream, client: _IP_address_, server: _domain_name_, request: "GET /favicon.ico HTTP/1.1", upstream: "http://127.0.0.1:8000/favicon.ico", host: "_domain_name_", referrer: "_domain_name_/path"` and also in the console I see `polling-xhr.js:264 GET https://0.0.0.0:8000/socket.io/?EIO=3&transport=polling&t=MvSED6k net::ERR_CONNECTION_REFUSED` Sorry I'm late on the reply @Simon — Eder Maza, Nov 11 '19 at 20:48
No worries. Have you verified the security groups on the load balancer and your application server? What server are you using for the Flask application? gunicorn/gevent/etc? — Simon, Nov 11 '19 at 23:43
Security Groups for the NLB/EC2 is all open for TCP ports in range from 0 - 65535 and source is set to all 0.0.0.0/0. Also, HTTP and HTTPS are open as well and the source it all 0.0.0.0/0. The server I'm using is an Amazon Linux 2 AMI (RHEL) and it is a Flask app running with gunicorn. The command for deployment I use is `gunicorn --timeout 300 --workers 3 -k geventwebsocket.gunicorn.workers.GeventWebSocketWorker -w 1 app:app` @Simon — Eder Maza, Nov 12 '19 at 00:15
How does nginx know to proxy to gunicorn? I don't see any port specifications in your `gunicorn` command. Can you add --bind 0.0.0.0:8000 to it and try? And if that doesnt work try 127.0.0.1:8000 and remember to change back your polling-xhr.js as well — Simon, Nov 12 '19 at 00:42
`gunicorn --timeout 350 --bind 0.0.0.0:8000 --workers 3 -k geventwebsocket.gunicorn.workers.GeventWebSocketWorker -w 1 app:app` Just always remember that the gunicorn binded address:port, the nginx proxied adress:port and the socket.io xhr script's address:port should be the same (I've made this mistake a lot lol) — Simon, Nov 12 '19 at 00:43
And I changed the timeout to 350 cause that's the same timeout that NLB has, which you can't change: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout — Simon, Nov 12 '19 at 00:44
Also, I noticed you specified --workers 3 but you do -w 1. --workers and -w are the same flag and 1 is the default: http://docs.gunicorn.org/en/stable/settings.html#workers — Simon, Nov 12 '19 at 00:50
Okay, so I tried the change and still got the exact same errors. I also changed the socket = io('0.0.0.0:8000') back to io() and then was given the 400 error I received before which was `polling-xhr.js:264 POST https://www._domain_name_.com/socket.io/?EIO=3&transport=polling&t=MvTCmuA&sid=e861587a822a4d4d8d86d4b4a9150bb5` Also, I believe that nginx knows to proxy to gunicorn because it serves on 8000 by default. The rest of the site functionality works and the page comes up fine, its just the socket.io portion that's having issues. — Eder Maza, Nov 12 '19 at 01:26
The NGINX config is in the description. I updated it. @Simon — Eder Maza, Nov 12 '19 at 01:28
My flask app is being served on :8000, but my site and the NLB are on 443. So when I try to change the socket io config to socket = io('127.0.0.1:8000'), I get: `GET https://127.0.0.1:8000/socket.io/?EIO=3&transport=polling&t=MvTFLqu net::ERR_CONNECTION_REFUSED` And when I leave it as just socket = io() I get everything working and the socket is up, but there is the POST 404 error... I showed above. I think that's because when it routes the request back through the NLB its coming as a 443 and NGINX doesn't know what to do with that maybe? — Eder Maza, Nov 12 '19 at 01:40
Not too sure how do debug that. Are you sure your domain is also on www? www is just a sub domain. Can you see the logs for the 404? And you can't do https://127.0.0.1:8000 since the SSL is on your load balancer not the web server, right? — Simon, Nov 12 '19 at 03:44
Yeah, I’m sure. The rest of the site is live and well, the only thing going wrong is that post request to the socket. Also, I’m sorry it was the same 400 error I got above with the socket, not 404. And yes exactly. @Simon — Eder Maza, Nov 13 '19 at 02:47
Are you doing any sort of CORS management on your Flask application? Like setting Header, Origin, Credential options? — Simon, Nov 19 '19 at 20:54

400 Handshake Error With Application Load Balancer AWS (Flask & Socket.io)

0 Answers0