2

I have the following code which handles some websocket messages:

Flux.create(sink -> {
        handler//.log("handler")
               .doOnNext(s -> log.info("propagating: {}", s))
               .doOnNext(sink::next)
               .doOnError(sink::error)
               .onErrorComplete() // this allows you to silence the error and let the upstream subscriber handle it.
               .publishOn(Schedulers.boundedElastic())
               .subscribeOn(Schedulers.boundedElastic(), false)
               .subscribe();
    })
    .take(10)
    .onErrorResume(e -> Exceptions.unwrap(e)
                                  .getClass()
                                  .isAssignableFrom(Errors.NativeIoException.class),
                   t -> {
                       log.error("Websocket connection error: {}", t.getMessage());
                       log.debug("{}", t.fillInStackTrace().toString());
                       // the error is handled and hidden by retryWhen.
                       return Mono.error(t);
                   })
    .retryWhen(RetryBackoffSpec.backoff(10000, Duration.ofSeconds(3))
                               .maxBackoff(Duration.ofSeconds(3))
                               .transientErrors(true)
                               .doBeforeRetry((s) -> log.error("Retrying connection to {}", "abc"))
                               .doAfterRetry(s -> log.error("Attempt {}/{} to restore connection to {} failed with {}", s.totalRetriesInARow(), 10000, "abc", s.failure().getMessage()))
              );

Every now and then the connection drops and that's why there's a retryWhen operator in the pipe. As you can see above, I am printing messages to the console which inform about the connection drop and how many times it retries.

I, however, am not able to figure out how to print a "recover" message (i.e. Connection to X restored). Am I missing something from the docs or am I expected to write a custom RetryBackoffSpec that does it?

Example log output:

03:11:34.205 10-02-2023 | INFO  | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | propagating: {"timestamp":1675991494202,"hello":"world!"}
03:11:34.703 10-02-2023 | INFO  | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | propagating: {"timestamp":1675991494702,"hello":"world!"}
03:11:35.205 10-02-2023 | INFO  | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | propagating: {"timestamp":1675991495203,"hello":"world!"}
03:11:35.704 10-02-2023 | INFO  | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | propagating: {"timestamp":1675991495703,"hello":"world!"}
03:11:46.746 10-02-2023 | ERROR | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | Websocket connection error: recvAddress(..) failed: Connection timed out
03:11:46.749 10-02-2023 | ERROR | boundedElastic-6     | org.example.chaos.simulation.NetworkDisruptionTest | Retrying connection to abc
03:11:49.752 10-02-2023 | ERROR | parallel-3           | org.example.chaos.simulation.NetworkDisruptionTest | Attempt 0/10000 to restore connection to abc failed with recvAddress(..) failed: Connection timed out
03:11:52.763 10-02-2023 | ERROR | boundedElastic-5     | org.example.chaos.simulation.NetworkDisruptionTest | Retrying connection to abc
03:11:55.764 10-02-2023 | ERROR | parallel-4           | org.example.chaos.simulation.NetworkDisruptionTest | Attempt 1/10000 to restore connection to abc failed with connection timed out: /172.25.0.2:8090
03:11:58.772 10-02-2023 | ERROR | boundedElastic-3     | org.example.chaos.simulation.NetworkDisruptionTest | Retrying connection to abc
03:12:01.773 10-02-2023 | ERROR | parallel-5           | org.example.chaos.simulation.NetworkDisruptionTest | Attempt 2/10000 to restore connection to abc failed with connection timed out: /172.25.0.2:8090
--- A message such as "Connection to abc has been restored." is expected to appear here.
tftd
  • 16,203
  • 11
  • 62
  • 106
  • 1
    Perhaps [this answer](https://stackoverflow.com/a/67962000/1765851) provides a solution to your problem? – Sven Feb 10 '23 at 04:32
  • Yeah, that's pretty much what I needed -- I had to wrap it in a Flux.defer to avoid concurrent connections from overwriting the "retrying". Thanks! Feel free to add an answer an I'll mark it as the accepted. – tftd Feb 10 '23 at 13:39

0 Answers0