25

To allow multiple iterations on the resulting stream from a CompletableFuture<Stream<String>> I am considering one of the following approaches:

  1. Convert the resulting future to CompletableFuture<List<String>> through: teams.thenApply(st -> st.collect(toList()))

  2. Convert the resulting future to Flux<String> with cache: Flux.fromStream(teams::join).cache();

Flux<T> is the implementation of Publisher<T> in project reactor.

Use case:

I would like to get a sequence with the premier league teams names (e.g. Stream<String>) from a data source which provides a League object with a Standing[] (based on football-data RESTful API, e.g. http://api.football-data.org/v1/soccerseasons/445/leagueTable). Using AsyncHttpClient and Gson we have:

CompletableFuture<Stream<String>> teams = asyncHttpClient
    .prepareGet("http://api.football-data.org/v1/soccerseasons/445/leagueTable")
    .execute()
    .toCompletableFuture()
    .thenApply(Response::getResponseBody)
    .thenApply(body -> gson.fromJson(body, League.class));
    .thenApply(l -> stream(l.standings).map(s -> s.teamName));

To re-use the resulting stream I have two options:

1. CompletableFuture<List<String>> res = teams.thenApply(st -> st.collect(toList()))

2. Flux<String> res = Flux.fromStream(teams::join).cache()

Flux<T> is less verbose and provides all that I need. Yet, is it correct to use it in this scenario?

Or should I use CompletableFuture<List<String>> instead? Or is there any other better alternative?

UPDATED with some thoughts (2018-03-16):

CompletableFuture<List<String>>:

  • [PROS] The List<String> will be collected in a continuation and when we need to proceed with the result of the future, maybe it is already completed.
  • [CONS] Declaration verbosity.
  • [CONS] If we just want to use it once, then we did not need to collect those items in a List<T>.

Flux<String>:

  • [PROS] Declaration conciseness
  • [PROS] If we just want to use it once, then we can omit .cache() and forward it to the next layer, which can take advantage of the reactive API, e.g. web flux reactive controller, e.g. @GetMapping(produces =MediaType.TEXT_EVENT_STREAM) public Flux<String> getTeams() {…}
  • [CONS] If we want to reuse that Flux<T> we have to wrap it in a cacheable Flux<T> (….cache()) which in turn will add overhead on the first traversal, because it has to store the resulting items in an internal cache.
Jon
  • 3,573
  • 2
  • 17
  • 24
Miguel Gamboa
  • 8,855
  • 7
  • 47
  • 94

2 Answers2

16
    CompletableFuture<Stream<String>> teams = ...;
    Flux<String> teamsFlux = Mono.fromFuture(teams).flatMapMany(stream -> Flux.fromStream(stream));

Flux.fromStream(teams::join) is a code smell because it's blocking the current thread to fetch the result from the CompletableFuture which is running on another thread.

youhans
  • 6,101
  • 4
  • 27
  • 39
  • I think It does not hold the thread that invokes the `fromStream(...)`. That is lazy. We are just passing a method handle to `join` instead of invoking it. Just after a subscription maybe the pushing thread will hold on `join()`, but only if the source `CompletableFuture` has not completed yet. – Miguel Gamboa Jun 27 '18 at 11:29
4

Once you have downloaded the league table, and the team names are extracted from this table, I'm not sure you need a back-pressure ready stream to iterate over these items. A conversion of the stream to a standard list (or array) should be good enough, and should probably have better performance, no?

For instance:

String[] teamNames = teams.join().toArray(String[]::new);
Stéphane Appercel
  • 1,427
  • 2
  • 13
  • 21
  • I am not sure about overheads. Collecting items to the `List` also induces additional overheads, isn’t it? If we are not always reusing the resulting `teams`, then creating a _cold stream_ from `Supplier::stream` maybe would be more efficient because we are not creating an auxiliary `List` object. On the other hand, if we are sure that we want to reuse `teams` then may be is better to collect it in a `List` than use `cache()`, because the later will add overhead on the first traversal. These are my doubts!! – Miguel Gamboa Mar 13 '18 at 11:20
  • Indeed, you can keep a stream if your intent is to read the stream once; but if you have to parse the collection of items several times, I would prefer transform the stream of items into an array of items rather than using a back-pressure ready cold stream to iterate over these items. – Stéphane Appercel Mar 15 '18 at 08:45
  • 2
    Not sure turning an async pipleline into a sync pipeline is the best suggestion I have ever seen. – Boris the Spider Jun 27 '18 at 06:48