3

I have a Reactor Kafka application that consumes messages from a topic indefinitely. I need to expose a health check REST endpoint that can indicate the health of this process - Essentially interested in knowing if the Kafka receiver flux sequence has terminated so that some action can be taken to start it. Is there a way to know the current status of a flux (completed/terminated etc)? The application is Spring Webflux + Reactor Kafka.

Edit 1 - doOnTerminate/doFinally do not execute

        Flux.range(1, 5)
                .flatMap(record -> Mono.just(record)
                        .map(i -> {
                                throw new OutOfMemoryError("Forcing exception for " + i);
                        })
                        .doOnNext(i -> System.out.println("doOnNext: " + i))
                        .doOnError(e -> System.err.println(e))
                        .onErrorResume(e -> Mono.empty()))
                .doFinally(signalType -> System.err.println("doFinally: Terminating with Signal type: " + signalType))
                .doOnTerminate(()-> System.err.println("doOnTerminate: executed"))
                .subscribe();
"C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\lib\idea_rt.jar=52295:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_211\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\rt.jar;C:\Users\akoul680\intellij-workspace\basics\target\classes;C:\Users\akoul680\.m2\repository\com\zaxxer\HikariCP\3.4.1\HikariCP-3.4.1.jar;C:\Users\akoul680\.m2\repository\org\apache\kafka\kafka-clients\2.2.0\kafka-clients-2.2.0.jar;C:\Users\akoul680\.m2\repository\com\github\luben\zstd-jni\1.3.8-1\zstd-jni-1.3.8-1.jar;C:\Users\akoul680\.m2\repository\org\lz4\lz4-java\1.5.0\lz4-java-1.5.0.jar;C:\Users\akoul680\.m2\repository\org\xerial\snappy\snappy-java\1.1.7.2\snappy-java-1.1.7.2.jar;C:\Users\akoul680\.m2\repository\org\apache\avro\avro\1.9.0\avro-1.9.0.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.9.8\jackson-core-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.9.8\jackson-databind-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.9.0\jackson-annotations-2.9.0.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\Users\akoul680\.m2\repository\com\codahale\metrics\metrics-core\3.0.2\metrics-core-3.0.2.jar;C:\Users\akoul680\.m2\repository\org\junit\jupiter\junit-jupiter-api\5.3.2\junit-jupiter-api-5.3.2.jar;C:\Users\akoul680\.m2\repository\org\apiguardian\apiguardian-api\1.0.0\apiguardian-api-1.0.0.jar;C:\Users\akoul680\.m2\repository\org\opentest4j\opentest4j\1.1.1\opentest4j-1.1.1.jar;C:\Users\akoul680\.m2\repository\org\junit\platform\junit-platform-commons\1.3.2\junit-platform-commons-1.3.2.jar;C:\Users\akoul680\.m2\repository\org\slf4j\slf4j-api\1.7.26\slf4j-api-1.7.26.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-core\1.2.3\logback-core-1.2.3.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-classic\1.2.3\logback-classic-1.2.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-core\3.4.10\reactor-core-3.4.10.jar;C:\Users\akoul680\.m2\repository\org\reactivestreams\reactive-streams\1.0.3\reactive-streams-1.0.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-test\3.4.10\reactor-test-3.4.10.jar;C:\Users\akoul680\.m2\repository\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\Users\akoul680\.m2\repository\com\box\box-java-sdk\2.32.0\box-java-sdk-2.32.0.jar;C:\Users\akoul680\.m2\repository\com\eclipsesource\minimal-json\minimal-json\0.9.1\minimal-json-0.9.1.jar;C:\Users\akoul680\.m2\repository\org\bitbucket\b_c\jose4j\0.4.4\jose4j-0.4.4.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcprov-jdk15on\1.52\bcprov-jdk15on-1.52.jar;C:\Users\akoul680\.m2\repository\com\jcraft\jsch\0.1.55\jsch-0.1.55.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-vfs2\2.4\commons-vfs2-2.4.jar;C:\Users\akoul680\.m2\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcpkix-jdk15on\1.52\bcpkix-jdk15on-1.52.jar;C:\Users\akoul680\intellij-workspace\basics\lib\db2jcc4.jar" lrn.chapter14.ErrorHandling
2021-10-12T09:53:34,344 main r.util.Loggers - Using Slf4j logging framework
Exception in thread "main" java.lang.OutOfMemoryError: Forcing exception for 1
    at lrn.chapter14.ErrorHandling.lambda$null$0(ErrorHandling.java:19)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:281)
    at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:354)
    at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.request(FluxPeekFuseable.java:437)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.request(MonoPeekTerminal.java:139)
    at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onSubscribe(MonoPeekTerminal.java:152)
    at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.onSubscribe(FluxPeekFuseable.java:471)
    at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:263)
    at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4361)
    at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:426)
    at reactor.core.publisher.FluxRange$RangeSubscription.slowPath(FluxRange.java:156)
    at reactor.core.publisher.FluxRange$RangeSubscription.request(FluxRange.java:111)
    at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:371)
    at reactor.core.publisher.FluxRange.subscribe(FluxRange.java:69)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8468)
    at reactor.core.publisher.Flux.subscribeWith(Flux.java:8641)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8438)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8362)
    at reactor.core.publisher.Flux.subscribe(Flux.java:8280)
    at lrn.chapter14.ErrorHandling.ex5(ErrorHandling.java:26)
    at lrn.chapter14.ErrorHandling.main(ErrorHandling.java:12)

Process finished with exit code 1

ankush
  • 167
  • 1
  • 11

1 Answers1

1

You can't query the flux itself, but you can tell it to do something if it ever stops.

In the service that contains your Kafka listener, I'd recommend adding a terminated (or similar) boolean flag that's false by default. You can then ensure that the last operator in your flux is:

.doOnTerminate(() -> terminated = true)

...and then get the healthcheck endpoint to monitor that value, marking the container as unhealthy if that flag is ever true.

doOnTerminate() is more reliable than doOnError() in this use-case, as it executes whether the publisher has terminated either with an error, or a completion signal. As per the comment though, this isn't completely reliable - if your publisher terminates due to a JVM error or similar, that doOnTerminate() operator won't be run.

In my experience, if this happens it's usually due to an OutOfMemoryError, in which case the -XX:+ExitOnOutOfMemoryError is a good VM option to use (the immediate exit can then trigger an immediate restart policy, without waiting for a healthcheck endpoint to be called and trigger the restart after a while.)

Bear in mind there are other fatal JVM errors that wouldn't get caught by the above process though, so that's still not 100% reliable.

Michael Berry
  • 70,193
  • 21
  • 157
  • 216
  • Thanks! Is there a way to ensure that doOnTerminate always reliably executes when the publisher terminates? The documentation (https://projectreactor.io/docs/core/3.4.10/reference/index.html#error.handling) mentions that there is a class of exceptions that cant be propogated. It does not provide a list of such exceptions however. I have edited my post with an example where doOnTerminate / doFinally did not execute. – ankush Oct 12 '21 at 04:21
  • @ankush Afraid not that I'm aware of - this is the case when the exception falls within a few "fatal" exception categories, of which the one you're by far the most likely to see if `OutOfMemoryError`. I've added a bit about dealing with that in a better way, but afraid I can't pretend to have a solution that's 100% bulletproof. – Michael Berry Oct 12 '21 at 08:00
  • 1
    Thanks a lot for your help! I am going to incorporate your suggestions. – ankush Oct 12 '21 at 10:27
  • After spending some more time thinking about this, i have a query. I already have repeat/retryWhen operators in my sequence to indefinitely resubscribe to the source flux. The problem is that for some errors repeat/retryWhen dont run. I think doOnTerminate will be useful in my case, if it can run on some errors for which repeat/retryWhen dont run - so it can set the unhealthy flag. If repeat/retryWhen/doOnTerminate all run for the same domain of errors, then i think adding doOnTerminate wont be useful. Your thoughts? – ankush Oct 13 '21 at 06:07
  • 1
    @ankush In that case the only thing I think you'll get from doOnTermimate is the ability to fire on a completion signal. It could still be useful as a catch all in case your logic above is incorrect of course. (I could be wrong, reactive Kafka isn't my strongest point!) – Michael Berry Oct 13 '21 at 06:31
  • 1
    fatal exceptions in Reactor are `VirtualMachineError`, `ThreadDeath` and `LinkageError` (and child classes like `OutOfMemoryError`). what kind of non-retriable exception(s) did you identify @ankush ? – Simon Baslé Oct 14 '21 at 08:00
  • @SimonBaslé - Only OutOfMemoryError - The reactor documentation mentions it as an example of an exception that cannot be propagated. Thanks for listing the others - (I was worried that there would be many of them)! – ankush Oct 14 '21 at 11:53
  • 1
    not too many, only the ones that we'd consider irrecoverable at JVM level. see https://projectreactor.io/docs/core/release/api/reactor/core/Exceptions.html#throwIfFatal-java.lang.Throwable- – Simon Baslé Oct 14 '21 at 16:27