0

I have service and a DNS [https://example.com/] which is pointing to Load Balancer in AWS to traffic all the requests. This endpoint needs to be authenticated and returning 504 Gateway Timeout. I can hit the endpoint locally without any problem using postman. But the issue persists on the DNS.

  1. There are other endpoints which are working fine.
  2. When I do ping example.com, I get redirected to ELB DNS first time but then I get request time out.
    PING <ELB DNS>: 56 data bytes
    Request timeout for icmp_seq 0
    Request timeout for icmp_seq 1
    Request timeout for icmp_seq 2
    Request timeout for icmp_seq 3
    Request timeout for icmp_seq 4
    Request timeout for icmp_seq 5
    Request timeout for icmp_seq 6
    Request timeout for icmp_seq 7
    Request timeout for icmp_seq 8
    Request timeout for icmp_seq 9
    Request timeout for icmp_seq 10
    Request timeout for icmp_seq 11
    Request timeout for icmp_seq 12
    Request timeout for icmp_seq 13

Need some pointers on debugging this.

Rules are as follows;

rules:
    - host: <hostname>
      http:
        paths:
         - path: /*
           backend:
             serviceName: ssl-redirect
             servicePort: use-annotation
         - path: /v?/* 
           backend:
             serviceName: service A
             servicePort: 80
         - path: /*
           backend:
             serviceName:service B
             servicePort: 80
harry123
  • 760
  • 1
  • 7
  • 22
  • Do you mean, you are not able to access using the domain name but can access using the domain name of the ELB/IP? – GSSwain Apr 24 '21 at 22:11
  • @GSSwain No, I meant that I can hit `https://localhost:8082/` but not `https://example.com/`. When I am hitting the `https://localhost:8082/` with exact same headers and cookies, I am having no issues. However, `https://example.com/` gives 504. – harry123 Apr 25 '21 at 00:57
  • @GSSwain ELB_5XX in cloudwatch is showing few counts which would be of the endpoint I am hitting. Further, I think the time it is showing for response is 1.0 min. So, I think the timeout is 60s and hence I am getting 504. – harry123 Apr 25 '21 at 01:06
  • In my logs, I am getting `connection timed out : connect`, if it helps. – harry123 Apr 25 '21 at 01:29
  • Without the details of how you deploy, it seems to me may be you are missing a rule https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-update-rules.html .Check the general troubleshooting guide https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html#http-504-issues – GSSwain Apr 25 '21 at 03:06
  • Double check the health configuration https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html – GSSwain Apr 25 '21 at 03:07
  • @GSSwain there are few IP with 404 status codes while few others are healthy. The rules should be ok because I can hit the other endpoints [similar starting characters so the regex is working] without any problem. – harry123 Apr 25 '21 at 05:14
  • @GSSwain Damn even the ec2 instance Security group is different than load balancer. Is there a way in which I can redirect the url to localhost for debugging? Might be easier that way as well. – harry123 Apr 25 '21 at 06:10
  • @GSSwain Hello again, I have tried to fix the health check issues. There is no issue with the code as the postman request succeeds in 7.5s. I also increased the timeout to 120s and it still fails. So, I am really hitting a deadend on troubleshooting as well. – harry123 Apr 27 '21 at 16:17
  • What is behind the ALB for that endpoint? Is it an EC2, Lambda? What is the tech stack of your server code? – GSSwain Apr 27 '21 at 22:34
  • @GSSwain When I am going to the ALB and checking the rule [relevant to that endpoint] it takes me to target groups. I do not have any lambda function at all. I checked that for any other endpoint which has the same rule seems to work. e.g. `example.com/v2/` works but when `example.com/v2/` is requested I get 504. I am not sure if I should add ingress rule to make sure the LB requests are going to correct nodes/ec2 instances even though I think its pointless since other endpoints work just fine. – harry123 Apr 27 '21 at 22:51
  • when you say example.com/v2/something, example.com is a DNS mapping for the ALB or an API Gateway sits in front of your ALB? If an API Gateway is there ensure you have the correct mapping for the same. Run `dig ` – GSSwain Apr 27 '21 at 23:21
  • 1
    You can also try accessing your api /v2/ and share the results here. – GSSwain Apr 27 '21 at 23:34
  • Ensure the keep-alive of your code is lower than the ALB idle timeout but higher than the time it takes your api to respond. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Keep-Alive – GSSwain Apr 27 '21 at 23:45
  • @GSSwain Jesus. When I do `/v2/something1` I am getting 404 even though the `example.com/v2/something1` works. dig `example.com` returns the CNAME and VPC hostname is enabled. I added the ingress rules in question. is from load balancer description. – harry123 Apr 28 '21 at 00:59
  • @GSSwain : `dig` response ` example.com. 600 IN CNAME . . 60 IN A***** . 60 IN A*****` – harry123 Apr 28 '21 at 02:55
  • Also, when I am hitting the endpoint, the logs are not showing up in backend service container rather showing in frontend container. I am not sure if this is helpful. Frontend and backend services are sharing the same kubernetes namespace – harry123 Apr 28 '21 at 14:29

0 Answers0