How can I troubleshoot an AWS Application Load Balancer giving 504, while the EC2 instance behind it gives 200?

Question

I have an EC2 instance with a few applications successfully deployed onto it, listening for connections on ports 3000/3001/3002. I can correctly load a web page from it by connecting to its public DNS or public IP on the given port. I.e. curl http://<ec2-ip-address>:3000 works. So I know that the apps are running, and I know that the port bindings/firewall rules/EC2 security groups are all set up correctly to receive connections from the outside world.

I also have an Application Load Balancer, which is supposed to route traffic to the 3 apps depending on the host name, but it always gives me "504 Gateway Time-out". I've checked all the settings but I can't see what's wrong and I'm not really sure how to troubleshoot it from here.

The ALB has a single HTTPS/443 listener, with a cert that's valid for mydomain.com, app1.mydomain.com, app2.mydomain.com, app2.mydomain.com.
The listener has 3 rules, plus the default rule:
1. Host == app1.mydomain.com => app1-target-group
2. Host == app2.mydomain.com => app2-target-group
3. Host == app3.mydomain.com => app3-target-group
4. Default action (last resort) => default-target-group
Each target group contains only the single EC2 instance, over HTTP, with the following ports:
1. app1-target-group: 3000
2. app2-target-group: 3001
3. app3-target-group: 3002
4. default-target-group: 3000

Given that I can access the app directly, I'm sure it must be a problem with the way I've configured the ALB/listener/target groups. But the 504 doesn't give me much to go on.

I've tried to turn on access logs to an S3 bucket, but it doesn't seem to be writing anything there. There's a single object called ELBAccessLogTestFile, and no actual logs in the bucket.

EDIT: Some more information... I actually have nginx installed on the EC2 instance, which is where I was previously doing the SSL termination and hostname-to-port mapping/routing. If I change the default-target-group above to point to port 443 over HTTPS, then it works!

So for some reason, routing traffic - from the ALB to the EC2 instance over HTTPS on port 443 -> OK! - from the ALB to the EC2 instance over HTTP on port 3000 -> Broken!

But again, I can hit the instance directly on HTTP/3000 from my laptop.

Can you post the security group settings of your EC2 instance(s)? Just because it is accessible via your Internet connection does not necessarily mean it is accessible from the Load Balancer, which would be making an internal VPC network connection. — Mark B, Oct 31 '17 at 14:48
At the moment, the ALB and the EC2 instance are in the same security group, which has ingress for ports 22, 80, 443, and 2376, and egress for 80, 443, and 53, all on 0.0.0.0/0. I did try temporarily adding ingress on 3000, purely for testing from my laptop. I assumed that 3000 ingress wouldn't be necessary for the ALB->EC2 communications though, as long as they're in the same security group. Is that not correct? Even with the 3000 ingress, the ALB still gives 504s anyway. — Cam Jackson, Oct 31 '17 at 15:06
Being in the same security group is entirely irrelevant (and not really recommended). That doesn't grant any special access at all. You need to open port 3000 in the security group. If you don't want it open to the public then use the security group ID in the source field. — Mark B, Oct 31 '17 at 15:08
Yeah I don't intend to leave them in the same security group, it was just an interim state while I change things around. I think I know why it's working though now. I added ingress 3000 while I was testing, which allowed access from my laptop, but I didn't add egress 300, which I guess means that the ALB couldn't get out on 3000, even to access something in the same group on 3000. I'll try that now. — Cam Jackson, Oct 31 '17 at 15:11
Why are you defining egress rules at all? The default is for egress to be wide open in order to allow the server to make the outbound connections it needs, both to respond to requests as well as to download server updates, etc. — Mark B, Oct 31 '17 at 15:15
I'm using terraform to define my security groups, and terraform removes that default of allowing all egress, so you then have to define any egress that you need. I chose to specifically allow egress on 80/443/53, rather than add back the allow all egress rule. In any case, it works now! Yay! If you'd like to add as an answer your point that the security group rules need to specifically allow ingress/egress for comms within a security group, I'll happily accept that as the solution :) — Cam Jackson, Oct 31 '17 at 15:21

score 4 · Accepted Answer · answered Oct 31 '17 at 15:48

Communication between resources in the same security group is not open by default. Security group membership alone does not provide special access. You still need to open the ports in the security group to allow other resources in the security group to access those ports. You can specify the security group ID in the rule's source field if you don't want to open it up beyond the resources in the security group.

How can I troubleshoot an AWS Application Load Balancer giving 504, while the EC2 instance behind it gives 200?

1 Answers1