0

I'm using Spring Boot 3.0.5 and Spring Cloud Release Train 2022.0.2.

My current architecture is in the picture below: I am attempting to use a load balanced discovery service to route incoming external requests to an internal service.

Architecture

I've been having issues where if I create three instances of "External Service", they will be registered with Eureka, but then if one of them goes down, subsequent requests will round robin and eventually try and hit the down instance.

After upgrading all of my projects to Spring Cloud 2022.0.2, eventually the discovery service / load balancer will heal in the gateway service, but not after a non-trivial amount of time.

Things I've tried:

  1. I tried creating my own load balancer client as described here but I couldn't figure out how to make the load balancer being used by the gateway be my own custom configured load balancer. (I especially didn't really understand the purpose of the client service id in @LoadBalancerClient(value = "stores", configuration = CustomLoadBalancerConfiguration.class)
  2. I tried switching to the health-check pre-made load balancer using the following configuration
    loadbalancer:
      configurations: health-check

For some reason when I do this, the external service can't be found when I make requests to the gateway.

What is the recommended way to setup the Eureka server and Eureka clients in applications so that load balancing occurs when making requests and services that go down aren't routed to?

Below is an example screenshot of my Eureka dashboard and a console log of the Gateway service when requests were being routed to a down external service. I set the enable-self-preservation option of false in the Eureka server and it seems to now sometimes evict down services, but it gives a scary warning now.

  server:
    enable-self-preservation: false

So far Eureka integration with Spring Boot has been very opaque: the properties available don't have much explanation on the Spring Docs website. Am I missing something or some source of information?

Stack Trace:


io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.64.0.29:9007
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    *__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ org.springframework.web.filter.reactive.ServerHttpObservationFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ HTTP POST "/ticketing/v1/stop" [ExceptionHandlingWebHandler]
Original Stack Trace:
Caused by: java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[na:na]
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[na:na]
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
    at java.base/java.lang.Thread.run(Thread.java:1589) ~[na:na]

2023-04-04T16:24:41.184-04:00 DEBUG 53465 --- [tbeatExecutor-0] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_DSIMS-GATEWAY/172.64.0.29:dsims-gateway:80 - Heartbeat status: 200
2023-04-04T16:24:41.507-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Got delta update with apps hashcode DOWN_1_UP_3_
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9009 to the existing apps in region null
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9008 to the existing apps in region null
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9007 to the existing apps in region null
2023-04-04T16:24:41.509-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : The total number of instances fetched by the delta processor : 3
2023-04-04T16:24:41.511-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : The total number of all instances in the client now is 4
2023-04-04T16:24:41.512-04:00 TRACE 53465 --- [freshExecutor-0] o.s.c.netflix.eureka.CloudEurekaClient   : onCacheRefreshed called with count: 18
2023-04-04T16:24:41.530-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Completed cache refresh task for discovery. All Apps hash code is Local region apps hashcode: DOWN_1_UP_3_, is fetching remote regions? false 
2023-04-04T16:25:11.192-04:00 DEBUG 53465 --- [tbeatExecutor-0] com.netflix.discovery.DiscoveryClient    : DiscoveryClient_DSIMS-GATEWAY/172.64.0.29:dsims-gateway:80 - Heartbeat status: 200
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Got delta update with apps hashcode DOWN_1_UP_3_
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9009 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9008 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Added instance 172.64.0.29:rampaging-railgun:9007 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : The total number of instances fetched by the delta processor : 3
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : The total number of all instances in the client now is 4
2023-04-04T16:25:11.538-04:00 TRACE 53465 --- [freshExecutor-0] o.s.c.netflix.eureka.CloudEurekaClient   : onCacheRefreshed called with count: 19
2023-04-04T16:25:11.548-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient    : Completed cache refresh task for discovery. All Apps hash code is Local region apps hashcode: DOWN_1_UP_3_, is fetching remote regions? false 

enter image description here

1 Answers1

2

I had the similar problem that api-gateway sent a request to a service that was already stopped, so I got whitelabel errors.

This link has a lot of useful information, I used the retry pattern and it works perfectly. https://github.com/danielsobrado/spring-cloud-kafka-microservices

Here is my config:

.yml:

spring:
  application:
    name: api-gateway
  cloud:
    gateway:
      discovery:
        locator:
          enabled: true
          lower-case-service-id: true
      default-filters:
        - name: Retry
          args:
            retries: 3
            methods: GET, POST
            series: SERVER_ERROR
            exceptions: java.io.IOException
            backoff:
              factor: 2

Or .properties:

spring.cloud.gateway.default-filters[0].name=Retry
spring.cloud.gateway.default-filters[0].args.retries=3
spring.cloud.gateway.default-filters[0].args.methods=GET, POST, PUT, DELETE
spring.cloud.gateway.default-filters[0].args.series=SERVER_ERROR
spring.cloud.gateway.default-filters[0].args.exceptions=java.io.IOException
spring.cloud.gateway.default-filters[0].args.backoff.factor=2

I hope it helps

Danni
  • 21
  • 4