I'm trying to enable retry capability within a Zuul gateway, and am able to get things working locally, but when I deploy the gateway to PCF, I get the following error when zuul.retryable=true
:
{
"timestamp": 1524669167094,
"status": 500,
"error": "Internal Server Error",
"exception": "com.netflix.zuul.exception.ZuulException",
"message": "COMMAND_EXCEPTION"
}
The related logs give me the following exception details:
com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: spring-demo failed and no fallback available.
Caused by: org.apache.http.NoHttpResponseException: spring-demo.example.com:443 failed to respond
I've tested spring-demo.example.com
and it responds correctly (200) within 200 ms and Zuul is also able to get a valid response when I remove zuul.retryable
property (although then it doesn't retry any error status codes or timeouts).
When I run locally, I can see the RibbonLoadBalancedRetryPolicy
try the different instances on timeout or when getting a 500 so it's only in PCF that I'm getting the error. I've verified that the instances show up in the PCF Eureka and also tried increasing the connect/read/hystrix timeouts.
Here's the service layout:
- 2 instances of "working" app connected to Eureka as "spring-demo"
- 2 instances of "broken" app connected to Eureka as "spring-demo" (times out or returns 500)
- Zuul connected to Eureka
Zuul application.yml:
zuul:
ignoredServices: '*'
ignoredPatterns: '/**/actuator/**'
retryable: true
routes:
spring-demo: '/spring-demo/**'
ribbon:
retryableStatusCodes: 404, 500
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 5
OkRetryOnConnectionErrors: true
Gradle dependency versions:
- Spring Boot 1.5.12.RELEASE
- Spring Cloud Edgware.SR3
- Pivotal Services 1.6.3.RELEASE
- spring-boot-starter-web
- spring-boot-starter-actuator
- spring-cloud-starter-netflix-zuul
- spring-retry
- spring-cloud-services-starter-service-registry
- spring-cloud-services-starter-circuit-breaker