How do I properly use Flow.onStart {} to re-fetch cached content?

Question

I have a method for fetching Something, let's make it a String for simplicity. The method should return a flow that initially emits the cached string, and then emits the “fresh” value after querying my API.

Thankfully Room emits new data whenever a given table is updated, so that part of the logic works out of the box. I’ve got the refreshing/re-fetching to work as well. But when I try to use .onStart{} (which IMHO looks a bit cleaner), that’s when both the functionality and my understanding fall apart :/

Here's a proof of concept that should run within IntelliJ or Android Studio without too many unusual dependencies:

// Room automatically emits new values on dbFlow when the relevant table is updated
val dbFlow = MutableStateFlow("cachedValue")

// refresh simulates fetchSomethingFromApi().also { someDao.updateData(it) }
val refresh = suspend {
    delay(1000) // simulate API delay
    stream.value = "freshValueFromAPI"
}

suspend fun doesNotWork(): Flow<String> = dbFlow
    .onStart {
        coroutineScope {
            launch {
                refresh()
            }
        }
    }

suspend fun thisWorks(): Flow<String> = flow {
    coroutineScope {
        launch {
            refresh()
        }
        dbFlow.collect {
            emit(it)
        }
    }
}

How to test:

runBlocking {
    thisWorks().take(2).collect {
        println(it)
    }
}

or:

runBlocking {
    doesNotWork().take(2).collect {
        println(it)
    }
}

I expect both to produce the same results, however the one with .onStart {} never emits the cached value, so .take(2) eventually times out (since it only emits once).

What is going on here?

score 3 · Accepted Answer · answered Feb 01 '21 at 21:58

The reason for this behavior is that
a) onStart { ... } is executed before the flow is collected.
In a simple example:

flow {
    emit("foo")  
}.onStart {
    println("bar")
}.collect {
    println(it)
}

produces

bar
foo

and b) coroutineScope {...} waits until all child coroutines launched inside the block have completed
Another example:

suspend fun foo() {
    coroutineScope {
        launch {
            delay(1000)
        }
    }
}

Calling this function will take ~1000ms since coroutineScope will wait until the inner child coroutine has completed

Now to your example

suspend fun doesNotWork(): Flow<String> = dbFlow
    .onStart {
        coroutineScope {
            launch {
                refresh()
            }
        }
    }

According to b), this has the same behavior as

suspend fun doesNotWork(): Flow<String> = dbFlow
    .onStart {
        refresh()
    }

Since onStart{...} is executed before the flow is collected, this is the same as writing

suspend fun doesNotWork(): Flow<String> = flow {
    refresh()
    // could be simplified  to emitAll(dbFlow)
    dbFlow.collect {
        emit(it)
    }
}

Now you see how this differs from your working example. You first refresh from your api, then start emitting values from the DB. While your working example starts a new coroutine that asynchronously refreshes from your api and immediately starts emitting values from your DB.

`emitAll()` seems to act terminal all though in theory I agree that it should do the same. In other words, `emitAll()` will finish the flow after the two first emits and not wait for subsequent emits (not relevant in this example, but in the full Room use case it matters) — cekrem, Feb 02 '21 at 07:56

How do I properly use Flow.onStart {} to re-fetch cached content?

1 Answers1