4

I am trying to process a tree of data objects. Each tree leaf is supposed to be processed through a function using a coroutine. The whole process should be done using a fixed size threadpool.

So I came up with this:

val node = an instance of WorkspaceEntry (tree structure)
val localDispatcher = newFixedThreadPoolContext(16)

fun main() {
    val job = SupervisorJob()
    val scope = CoroutineScope(localDispatcher + job)
    handleEntry(node, scope)

    runBlocking {
        job.join()
    }
}

The handleEntry method recursively launches a child job in the supervisor for each tree leaf.

The child jobs of the supervisor all complete successfully, but the join never returns. Am I understanding this wrong?

Edit: HandleEntry function

private fun handleEntry(workspaceEntry: WorkspaceEntry, scope: CoroutineScope) {
    if (workspaceEntry is FileEntry) {
        scope.launch {
            FileTypeRegistry.processFile(workspaceEntry.fileBlob)
        }
    } else {
        workspaceEntry.children.forEach { child -> handleEntry(child, scope) }
    }
}
Sergio
  • 27,326
  • 8
  • 128
  • 149
J Horseman
  • 318
  • 2
  • 12
  • add please code for `handleEntry` function. – Sergio Dec 24 '18 at 18:21
  • by "join never returns" do you mean the thread is blocked and the app is hung? The `main` function does not complete? – Sergio Dec 25 '18 at 07:25
  • Yes. It will wait indefinitely for the job to complete. I checked: The child jobs do complete and get destroyed, until the supervisor does not have any children left. But the job never enters the complete-state itself. – J Horseman Dec 25 '18 at 10:20
  • Seems like if you `cancel` the `SupervisorJob` it cancels all its childern and you *can* wait for its and its childern completion then with `join`. But of course you get a maybe unwanted `CancellationException` in each child job which leads to stopping execution if it calls some `suspend` function. – xuiqzy Oct 01 '20 at 15:00

2 Answers2

4

It seems the Job that is used to create CoroutineContext (in your case SupervisorJob) is not intended for waiting child coroutines to finish, so you can't use job.join(). I guess the main intent of that Job is to cancel child coroutines. Changing runBlocking block to the following will work:

runBlocking {
    job.children.forEach {
        it.join()
    }
}
Sergio
  • 27,326
  • 8
  • 128
  • 149
  • While this would work, I am pretty sure, this is not the intended solution. The documentation of join explicitly states, that the child jobs must be complete, too. Also, since the supervisor itself has no job assigned, it should switch to the completed state immediately after starting the job. Starting the job happens, both when child jobs are launched and when join() is called. At some point during development, there existed a function called joinChildren(). But its gone now. – J Horseman Dec 25 '18 at 12:12
  • But yeah, since I probably did not understand something completely wrong, I'll have to do it that way – J Horseman Dec 25 '18 at 12:37
1

You have mixed two roles:

  1. the master job found in the coroutine scope that never completes on its own and is used to control the lifecycle of everything else
  2. the job corresponding to a unit of work, possibly decomposed into more child jobs

You need both, like this:

val masterJob = SupervisorJob()
val scope = CoroutineScope(localDispatcher + masterJob)

val unitOfWork = scope.launch { handleEntry(node, scope) }
runBlocking { unitOfWork.join() }

The above code doesn't really motivate the existence of the master job because you start just one child job from it, but it may make sense in a wider picture, where you have some context from which you launch many jobs, and want to be able to write

masterJob.cancel()

to cancel everything before it's done.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • in fact I use the supervisor job, because I launch multiple child jobs from it (handleEntry recursively launches more jobs) and I want them to be able to fail independently without interrupting the whole process. – J Horseman Dec 25 '18 at 14:59
  • But the solution does not work. Instead of blocking indefinitely, the master job waits for handleEntry() to finish and instantly switches to the completed state: join() returns. However, the additional jobs launched within handleEntry (using the scope parameter) are still running. – J Horseman Dec 25 '18 at 15:03
  • Are you talking about my code or yours? `masterJob` doesn't wait for anything to complete so it never changes state. `unitOfWork` propagates failure to the other coroutines, that's true. So your situation is actually not a standard one and therefore the approach with `masterJob.children.forEach { it.join() }` is the right one. – Marko Topolnik Dec 25 '18 at 16:23
  • Okay, but then what's the point of adding `masterJob` to the scope? – J Horseman Dec 25 '18 at 18:37
  • 1
    `masterJob` is a `SupervisorJob`, which is why it doesn't propagate the failure of your child coroutines. It is also the collector of all the coroutines started within its scope, so you can `join` them all. – Marko Topolnik Dec 25 '18 at 20:44
  • So I have to hold a reference to the `masterJob` to cancel *and* join all its children? I cannont do this with the reference to the `scope` even though I can `cancel` the scope and all childs of the job are cancelled? – xuiqzy Oct 01 '20 at 14:35
  • You don't. The job is there at `scope.coroutineContext[Job]!!`. – Marko Topolnik Oct 02 '20 at 12:42