How to transform reactive kafka receiver flux to kotlin coroutines?

Question

tldr: Using spring, reactive, kafka, webflux, coroutines, need to somehow connect receiver Flux to coroutine Flow, but are GlobalScope and Dispatcher.Unconfined the right tools for the job?

The context of the question:

I am using spring, reactive kafka, webflux and coroutines in my project.

My task is to consume a message from the kafka topic, then call the webflux client, retrieve a response and do some login based on the client response.

The main issue I encountered is how to convert Flux of receiver records into a coroutine Flow.

The solution I ended up with is the following piece of code

 @OptIn(DelicateCoroutinesApi::class)
    override suspend fun connect() = receiver.receive()
        .groupBy { it.receiverOffset().topicPartition() }
        .asFlow().onEach { partition ->
            partition.asFlow().onEach { record ->
                handleRecord(record)
            }
                .flowOn(Dispatchers.Unconfined).launchIn(GlobalScope)
        }.flowOn(Dispatchers.Unconfined).launchIn(GlobalScope)

Where I launch connect function for every consumer using PostConstruct where consumers is a list of beans implementing connect function

@OptIn(DelicateCoroutinesApi::class)
    @PostConstruct
    fun connectAll() = consumers.forEach {consumer-> 
        GlobalScope.launch { 
            consumer.connect()
        }
    }

Now the way I understand it works is the following:

connectAll() launches every connect method in a separate coroutine in a global scope, so they all run async, without meddling with each other and this is what I need, considering consumers should work independently.
then every connect method receives a Flux of ReceiverRecords and groups them by partition
every GroupedFlux of partition to Records is run in the GlobalScope using whatever thread to dispatch coroutine. If it fails for some reason, it wont crash the other partition to records groups
every record within partition to records GroupedFlux is run in a separate coroutine in GlobalScope, unconfined to a particular thread

The problem with unconfined dispatcher is that it may block a thread if a coroutine blocks on a blocking piece of code, which should not be the case in my app, since it is supposed to use non blocking stack.

However, I still do have a question, if I should change some of the dispatchers in my code.
Is it more beneficial to use Dispatcher.IO for the client calls performed in handleRecord method? I would not say, that the record handling method is very CPU consuming to use Dispatcher.Default, so it's probably either of the 2 above.

The other thing that is confusing to me, if GlobalScope is the right tool for the job.
The doc states, that it is delicate api and you should avoid using it. On the other hand it does accomplish my goal of running a consumer throughout the application lifecycle.
Furthermore, the coroutines, where the records are handled are not inherited by the one processing the partitions. So if a records happens to produce an exception, then it will not affect other records and partitions.

As I see it, the GlobalScope allows me to run consumers, without worrying, that an exception from one of them might disturb the other consumer.
The same thing with partitions and records.
But maybe it is more beneficial to have your own context for that kind of a task?
And is it possible for the records within 1 partition to be processed in offset order, when using this scope.

I think a good idea here is to sep up a custom thread pool with a fixed amount of threads: Executors.newFixedThreadPool().().asCoroutineDispatcher() So you can have one for the consumers who reads the msgs and another one for the ones who execute the action. Here is a nice example of what you can do: https://gist.github.com/jivimberg/b0f4f94871c6f3e7d17fae1106c28047 — Lucas Milotich, Nov 07 '22 at 16:25
@LucasMilotich thanks for the gist link, that is an interesting piece of code. Let's say I create a separate thread pool, then why not just use Dispatchers.IO, which has a thread pool of 64 threads by default? It feels like a separate threadpool might be an allocation of extra resources, when we have pretty much what we need already. — Dknot, Nov 07 '22 at 21:55
First of all because reading a message will be much faster than consuming and reacting to it. Therefore you will probably need more threads on the consuming part than the reading part. And just for the sake of the resources, IMHO it’s better to separate the concerns. On the other hand IO threads are not initialized by default. The are created on demand. You can check it here https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-dispatchers/-i-o.html — Lucas Milotich, Nov 08 '22 at 22:47

How to transform reactive kafka receiver flux to kotlin coroutines?

0 Answers0