RxJava: why same transformations are recomputed for each observables branch?

Question

Introduction

Consider simple piece of java code. It defines two observables a and b in terms of c which itself is defined using d (a, b, c, d have type Observable<Integer>):

    d = Observable.range(1, 10);
    c = d.map(t -> t + 1);
    a = c.map(t -> t + 2);
    b = c.map(t -> t + 3);

This code can be visualised using diagram where each arrow (->) represents transformation (map method):

             .--> a
   d --> c --|
             '--> b

If several chains of observables have own part then (in theory) new values of common part can be calculated only once. In example above: every new d value could be transformed into d --> c only once and used both for a and b.

Question

In practice I observe that transformation is calculated for each chain where this transformation is used (test). In other words example above should be correctly drawn like this:

   d --> c --> a
   d --> c --> b

In case of resource-consuming transformations new subscription at end of chain will cause computation of whole chain (and performance penalty).

Is where proper way to force transformation result to be cached and computed only once?

My research

I found two solutions for this problem:

Pass unique identificators together with values and store transformation results in some external storage (external to rx library).
Use subject to implement map-like function which hides start of observables chain. MapOnce code; test.

Both works. Second is simple but smells like a hack.

Dave Sexton · Accepted Answer · 2014-11-29T07:09:29.350

You've identified hot and cold observables.

Observable.range returns a cold observable, though you're describing the resulting queries in a hierarchy as if they're hot; i.e., as if they'd share subscription side effects. They do not. Each time that you subscribe to a cold observable it may cause side effects. In your case, each time that you subscribe to range (or to queries established on range) it generates a range of values.

In the second point of your research, you've identified how to convert a cold observable into a hot observable; namely, using Subjects. (Though in .NET you don't use a Subject<T> directly; instead, you'd use an operator like Publish. I suspect RxJava has a similar operator and I'd recommend using it.)

Additional Details

The definition of hot by my interpretation, as described in detail in my blog post linked above, is when an observable doesn't cause any subscription side effects. (Note that a hot observable may multicast connection side effects when converting from cold to hot, but temperature only refers to the propensity of an observable to cause subscription side effects because that's all we really care about when talking about an observable's temperature in practice.)

The map operator (Select in .NET, mentioned in the conclusion of my blog post) returns an observable that inherits the temperature of its source, so in your bottom diagram c, a and b are cold because d is cold. If, hypothetically, you were to apply publish to d, then c, a and b would inherit the hot temperature from the published observable, meaning that subscribing to them wouldn't cause any subscription side effects. Thus publishing d converts a cold observable, namely range, into a hot observable.

    .--> c --> a
d --|
    .--> c --> b

However, your question was about how to share the computation of c as well as d. Even if you were to publish d, c would still be recomputed for both a and b for each notification from d. Instead, you want to share the results of c among a and b. I call an observable in which you want to share its computation side effects, "active". (I borrowed the term from the passive & active terminology used in neuroscience to describe electrochemical currents in neurons.)

In your top diagram, you're considering c to be active because it causes significant computation side effects, by your own interpretation. Note that c is active regardless of the temperature of d. To share the computation side effects of an active observable, perhaps surprisingly, you must use publish just like for a cold observable. This is because technically active computations are side effects in the same sense as cold observables, while passive computations have no side effects, just like hot observables. I've restricted the terms hot and cold to only refer to the initial computation side effects, which I call subscription side effects, because that's how people generally use them. I've introduced new terms, active and passive, to refer to computation side effects separately from subscription side effects.

The result is that these terms in practice just blend together intuitively. If you want to share the computation side effects of c, then simply publish it instead of d. By doing so, a and b implicitly become hot because map inherits subscription side effects, as stated previously. Therefore, you're effectively making the right side of the observable hot by publishing either d or c, but publishing c also shares its computation side effects.

If you publish c instead of d, then d remains cold, but it doesn't matter since c hides d from a and b. So by publishing c you're effectively publishing d as well. Therefore, applying publish anywhere within your observable makes the right side of the observable effectively hot. It doesn't matter where you introduce publish or how many observers or pipelines you're creating on the right side of the observable. However, choosing to publish c instead of d also shares the computation side effects of c, which technically completes the answer to your question. Q.E.D.

I didn't think that this was to do with "Hot" or "Cold" at all, but just that Rx is a deferred execution model. It produces a computation chain that is independently wired up for each subscription and you have to use `Publish` or a `Subject` to shared parts of the computation. Am I wrong? — Enigmativity, Nov 27 '14 at 01:38
@Enigmativity The laziness of the monad is orthogonal to the temperature of the observable. *Hot* and *cold* observables are both lazy in Rx; e.g., forming a query on a mouseMoves (*hot*) observable doesn't cause any side effects until you call *Subscribe*. `Publish` makes a `cold` observable `hot`, which effectively shares subscription side effects. Read my article that I linked to in the answer for details on why the temperature of an observable depends on its propensity to cause subscription side effects. — Dave Sexton, Nov 27 '14 at 02:35
I'm happy with that, but I don't think that changes the issue here as to why there are two separate subscription pipelines. It doesn't matter if they are hot or cold. — Enigmativity, Nov 27 '14 at 05:30
Stating the temperatures of the observables says exactly what is shown in the diagrams. The top diagram is *hot* and the bottom diagram is *cold*. Consider this: In the top diagram, which observer (`a` or `b`) causes `d` to begin pushing values when it subscribes? Obviously the answer is neither (unless you assume `RefCount` behavior, which we won't here for simplicity.) In other words, neither `a` nor `b` causes any subscription side effects. That's exactly the definition of `hot`. Only when `c` is eventually "connected" do subscription side effects occur, and they're multicast to `a` and `b` — Dave Sexton, Nov 27 '14 at 06:04
[...Ran out of comment space] So the OP wants to take a `cold` observable, namely `range`, and make it `hot` to share its subscription side effects. The fact that there are two separate subscription pipelines on the right side of both diagrams is only relevant in that the OP wants both of them to share a single subscription to `d`. Thus, the issue is simply how to share the subscription side effects on the left side of the diagram; i.e., how to convert a `cold` observable into a `hot` observable. And that's exactly what `Publish` does. — Dave Sexton, Nov 27 '14 at 06:12
`publish` method helped me, thanks. "Connection point" is good metaphor. What about observable temperature: to fix my example I need to publish `c`. If I make `d` hot by publishing then only `d` will be broadcasted. `c` will not be broadcasted but will be recomputed from `d` 2 times for both `a` and `b`. — Vasiliy Kevroletin, Nov 27 '14 at 09:42
Near the end of my blog post (linked in my answer), before the conclusion section, I defined 2 terms to refer to any computation side effects that may occur after the initial computation side effects: *passive* or *active*. If you consider `c` to be *passive*, then you don't need to `publish` it; otherwise, it's *active* and may be worth publishing. Technically, the terms *hot* and *cold* could cover these cases too, but I felt it was too confusing and not very useful. It's better to think of temperature (as most do) as applying to subscription side effects only; i.e., the initial computation. — Dave Sexton, Nov 27 '14 at 14:47
[...continued] Therefore, in your example above that publishes `c` rather than `d`, you're actually implying that `c` is *active*. But that doesn't change the fact that you want to convert `d` from a *cold* observable to a *hot* observable using `publish`. You're just also converting `c` from *cold* to *hot* at the same time. No matter how you cut it, this question is simply about how to make a *cold* observable *hot*, even in the general sense. The number of observers and the point at which the conversion takes place are just additional factors that are orthogonal to the primary issue. — Dave Sexton, Nov 27 '14 at 14:51
Most time hot observables are described in literature as "real time" event streams. I expected that applying `map` to real time observable will give again real time observable. But appears that this is half-true. @DaveSexton could you please edit your answer and add few lines about passive/active observables. Then I will mark question as solved. Thank you for you answers. — Vasiliy Kevroletin, Nov 28 '14 at 00:18
Well "real time" implies "always running", which is a symptom of being *hot*. The real definition of *hot*, by my interpretation as described in detail in my blog post, is when an observable doesn't cause any subscription side effects (though it may multicast connection side effects instead). The `map` operator (`Select` in .NET) returns an observable that inherits the temperature of its source, so it's still *hot* and you'll still receive notifications in "real time", as long as there's no backpressure. I introduced the term *active* only to identify costly computation side effects. — Dave Sexton, Nov 28 '14 at 16:32
I'll edit my post to include this information per your request. — Dave Sexton, Nov 28 '14 at 16:33

score 1 · Answer 2 · answered Nov 29 '14 at 16:32

An Observable is lazily executed each time it is subscribed to (either explicitly or implicitly via composition).

This code shows how the source emits for a, b, and c:

    Observable<Integer> d = Observable.range(1, 10)
            .doOnNext(i -> System.out.println("Emitted from source: " + i));
    Observable<Integer> c = d.map(t -> t + 1);
    Observable<Integer> a = c.map(t -> t + 2);
    Observable<Integer> b = c.map(t -> t + 3);

    a.forEach(i -> System.out.println("a: " + i));
    b.forEach(i -> System.out.println("b: " + i));
    c.forEach(i -> System.out.println("c: " + i));

If you are okay buffering (caching) the result then it is as simple as using the .cache() operator to achieve this.

    Observable<Integer> d = Observable.range(1, 10)
            .doOnNext(i -> System.out.println("Emitted from source: " + i))
            .cache();

    Observable<Integer> c = d.map(t -> t + 1);
    Observable<Integer> a = c.map(t -> t + 2);
    Observable<Integer> b = c.map(t -> t + 3);

    a.forEach(i -> System.out.println("a: " + i));
    b.forEach(i -> System.out.println("b: " + i));
    c.forEach(i -> System.out.println("c: " + i));

Adding the .cache() to the source makes it so it only emits once and can be subscribed to many times.

For large or infinite data sources caching is not an option so multicasting is the solution to ensure the source only emits once.

The publish() and share() operators are a good place to start, but for simplicity, and since this is a synchronous example, I'll show with the publish(function) overload which is often the easiest to use.

    Observable<Integer> d = Observable.range(1, 10)
            .doOnNext(i -> System.out.println("Emitted from source: " + i))
            .publish(oi -> {
                Observable<Integer> c = oi.map(t -> t + 1);
                Observable<Integer> a = c.map(t -> t + 2);
                Observable<Integer> b = c.map(t -> t + 3);

                return Observable.merge(a, b, c);
            });

    d.forEach(System.out::println);

If a, b, c are wanted individually then we can wire everything up and "connect" the source when ready:

private static void publishWithConnect() {
    ConnectableObservable<Integer> d = Observable.range(1, 10)
            .doOnNext(i -> System.out.println("Emitted from source: " + i))
            .publish();

    Observable<Integer> c = d.map(t -> t + 1);
    Observable<Integer> a = c.map(t -> t + 2);
    Observable<Integer> b = c.map(t -> t + 3);

    a.forEach(i -> System.out.println("a: " + i));
    b.forEach(i -> System.out.println("b: " + i));
    c.forEach(i -> System.out.println("c: " + i));

    // now that we've wired up everything we can connect the source
    d.connect();
}

Or if the source is async we can use refCounting:

    Observable<Integer> d = Observable.range(1, 10)
            .doOnNext(i -> System.out.println("Emitted from source: " + i))
            .subscribeOn(Schedulers.computation())
            .share();

However, refCount (share is an overload to provide it) allows race conditions so won't guarantee all subscribers get the first values. It is usually only wanted for "hot" streams where subscribers are coming and going. For a "cold" source that we want to ensure everyone gets, the previous solutions with cache() or publish()/publish(function) are the preferred approach.

You can learn more here: https://github.com/ReactiveX/RxJava/wiki/Connectable-Observable-Operators

RxJava: why same transformations are recomputed for each observables branch?

2 Answers2