1

We had a primary ALB listening to all out apps mapped through R53 records. Now we have listener rule crunch as ALB doesn't support more rules above 100. So we had been proposed a solution where we can put a NLB under primary ALB and then secondary ALB under NLB. So flow will be:

Requests--->R53--->ALB1--->NLB--->ALB2--->Apps

ALB1 has a default rule which allows unmatched requests to pass through to NLB and then ultimately to ALB2 where new rules are evaluated.

Rule configuration at ALB1 is: Default rule --Forwardto-->

Rule at NLB: TCP-443 listener rule --ForwardTo--> ALB2 TG with fargate application ip

But we're seeing intermittent 502 responses on primary ALB while testing. We are not seeing any 502 logging on ALB2. So possibly NLB is ending it as we have seen multiple TArget reset count happening at NLB in metrics. Also, nothing is getting logged in application logs.

We did another testing where we directly routed traffic to ALB2 through R53, we didn't see any 502 responses there.

Any suggestion, how to go about the debugging it?

ashish bustler
  • 480
  • 1
  • 5
  • 12

1 Answers1

1

I think, I have the answer to my problem now, so sharing it for wider audience. The reason for intermittent 502s was the inconsistency of idle_timeout_value across the Lbs and backend application.

Since for NLB idle_timeout_value is set to 350 seconds by default, and can't be changed, we had inconsistent values across LBs. First ALB and last ALB had value 600 seconds. Ideally application should have highest idle_timeout_value followed by LBs in hierarchy. So setting up value of first ALB to 300 seconds and second ALB to 500 seconds solved this problem. And we haven't got a single 500 code post this implementation.

ashish bustler
  • 480
  • 1
  • 5
  • 12