Customize retries behaviour with Ribbon and Hystrix

Question

Objective

I have a task to write API Gateway & load balancer with the following objectives:

Gateway/LB should redirect requests to instances of 3rd party service (no code change = client-side service discovery)
Each service instance is able to process only single response simultaneously, concurrent request = immediate error response.
Services response latency is 0-5 seconds. I can't cache their responses, and therefore as I understand fallback is not an option for me. Also timeout is not an option, because latency is random and you haven't warranty you'll get better one on another instance.

My solution

Spring Boot Cloud Netflix: Zuul-Hystrix-Ribbon. Two approaches:

Retry. Ribbon retry with fixed interval or exponential increase. I failed to make it work, the best result I achieved is MaxAutoRetriesNextServer: 1000, where Ribbon fires retries immediatelly and spamming donwstream services.
Circuit Breaker. Instead of adding exponential wait period in Ribbon, I can open circuit after few fails and redirect requests to another services. This also not the best approach for two reasons: a) having only few instances each with 0-5 sec latency means open all circuits very quickly and fail to serve request. b) my configuration doesn't work for some reason

Question

How can I make Ribbon wait between retries? Or can I solve my problem with Circuit Breaker?

My configuration

Full config could be found on GitHub.

ribbon:
  eureka:
    enabled: false
  # Obsolete option (Apache HttpClient by default), but without this Ribbon doesn't retry against another instances
  restclient:
    enabled: true

hystrix:
  command:
    my-service:
      circuitBreaker:
        sleepWindowInMilliseconds: 3000
        errorThresholdPercentage: 50
        requestVolumeThreshold: 5
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 5500

my-service:
  ribbon:
    OkToRetryOnAllOperations: false
    NFLoadBalancerRuleClassName: com.netflix.loadbalancer.WeightedResponseTimeRule
    listOfServers: ${LIST_OF_SERVERS}
    ConnectTimeout: 500
    ReadTimeout: 4500
    MaxAutoRetries: 0
    MaxAutoRetriesNextServer: 1000
    retryableStatusCodes: 404,502,503,504

Tests

In order to check your assumptions, you can play with the test on GitHub, that simulates single-thread service instances with different latencies