I'm using Spring Boot 3.0.5 and Spring Cloud Release Train 2022.0.2.
My current architecture is in the picture below: I am attempting to use a load balanced discovery service to route incoming external requests to an internal service.
I've been having issues where if I create three instances of "External Service", they will be registered with Eureka, but then if one of them goes down, subsequent requests will round robin and eventually try and hit the down instance.
After upgrading all of my projects to Spring Cloud 2022.0.2, eventually the discovery service / load balancer will heal in the gateway service, but not after a non-trivial amount of time.
Things I've tried:
- I tried creating my own load balancer client as described here but I couldn't figure out how to make the load balancer being used by the gateway be my own custom configured load balancer. (I especially didn't really understand the purpose of the client service id in
@LoadBalancerClient(value = "stores", configuration = CustomLoadBalancerConfiguration.class)
- I tried switching to the
health-check
pre-made load balancer using the following configuration
loadbalancer:
configurations: health-check
For some reason when I do this, the external service can't be found when I make requests to the gateway.
What is the recommended way to setup the Eureka server and Eureka clients in applications so that load balancing occurs when making requests and services that go down aren't routed to?
Below is an example screenshot of my Eureka dashboard and a console log of the Gateway service when requests were being routed to a down external service. I set the enable-self-preservation option of false in the Eureka server and it seems to now sometimes evict down services, but it gives a scary warning now.
server:
enable-self-preservation: false
So far Eureka integration with Spring Boot has been very opaque: the properties available don't have much explanation on the Spring Docs website. Am I missing something or some source of information?
Stack Trace:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.64.0.29:9007
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ org.springframework.web.filter.reactive.ServerHttpObservationFilter [DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/ticketing/v1/stop" [ExceptionHandlingWebHandler]
Original Stack Trace:
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[na:na]
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[na:na]
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[na:na]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.90.Final.jar:4.1.90.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.90.Final.jar:4.1.90.Final]
at java.base/java.lang.Thread.run(Thread.java:1589) ~[na:na]
2023-04-04T16:24:41.184-04:00 DEBUG 53465 --- [tbeatExecutor-0] com.netflix.discovery.DiscoveryClient : DiscoveryClient_DSIMS-GATEWAY/172.64.0.29:dsims-gateway:80 - Heartbeat status: 200
2023-04-04T16:24:41.507-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Got delta update with apps hashcode DOWN_1_UP_3_
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9009 to the existing apps in region null
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9008 to the existing apps in region null
2023-04-04T16:24:41.508-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9007 to the existing apps in region null
2023-04-04T16:24:41.509-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : The total number of instances fetched by the delta processor : 3
2023-04-04T16:24:41.511-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : The total number of all instances in the client now is 4
2023-04-04T16:24:41.512-04:00 TRACE 53465 --- [freshExecutor-0] o.s.c.netflix.eureka.CloudEurekaClient : onCacheRefreshed called with count: 18
2023-04-04T16:24:41.530-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Completed cache refresh task for discovery. All Apps hash code is Local region apps hashcode: DOWN_1_UP_3_, is fetching remote regions? false
2023-04-04T16:25:11.192-04:00 DEBUG 53465 --- [tbeatExecutor-0] com.netflix.discovery.DiscoveryClient : DiscoveryClient_DSIMS-GATEWAY/172.64.0.29:dsims-gateway:80 - Heartbeat status: 200
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Got delta update with apps hashcode DOWN_1_UP_3_
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9009 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9008 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Added instance 172.64.0.29:rampaging-railgun:9007 to the existing apps in region null
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : The total number of instances fetched by the delta processor : 3
2023-04-04T16:25:11.537-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : The total number of all instances in the client now is 4
2023-04-04T16:25:11.538-04:00 TRACE 53465 --- [freshExecutor-0] o.s.c.netflix.eureka.CloudEurekaClient : onCacheRefreshed called with count: 19
2023-04-04T16:25:11.548-04:00 DEBUG 53465 --- [freshExecutor-0] com.netflix.discovery.DiscoveryClient : Completed cache refresh task for discovery. All Apps hash code is Local region apps hashcode: DOWN_1_UP_3_, is fetching remote regions? false