3

I'm rethinking our Spring MVC application behavior, whether it's better to pull (Java8 Stream) data from the database or let the database push (Reactive / Observable) it's data and use backpressure to control the amount.


Current situation:

  1. User requests the 30 most recent articles
  2. Service does a database query and puts the 30 results into a List
  3. Jackson iterates over the List and generates the JSON response

Why switch the implementation?

It's quite memory consuming, because we keep those 30 objects in memory all the time. That's not needed, because the application processes one object at a time. Though the application should be able to retrieve one object, process it, throw it away, and get the next one.


Java8 Streams? (pull)

With java.util.Stream this is quite easy: The Service creates a Stream, which uses a database cursor behind the scenes. And each time Jackson has written the JSON String for one element of the Stream, it will ask for the next one, which then triggers the database cursor to return the next entry.


RxJava / Reactive / Observable? (push)

Here we have the opposite scenario: The database has to push entry by entry and Jackson has to create the JSON String for each element until the onComplete method has been called.

i.e. the Controller tells the Service: give me an Observable<Article>. Then Jackson can ask for as many database entries as it can process.


Differences and concern:

With Streams there's always some delay between asking for next database entry and retrieving / processing it. This could slow down the JSON response time if the network connection is slow or there is a huge amount of database requests that have to be made to fulfill the response.

Using RxJava there should be always data available to process. And if it's too much, we can use backpressure to slow down the data transfer from database to our application. In the worst case scenario the buffer/queue will contain all requested database entries. Then the memory consumption will be equal to our current solution using a List.


Why am I asking / What am I asking for?

  • What did I miss? Are there any other pros / cons?

  • Why did (especially) the Spring Data Team extend their API to support Stream responses from the database, if there's always a (short) delay between each database request/response? This could sum up to some noticeable delay for a huge amount of requested entries.

  • Is it recommended to go for RxJava (or some other reactive implementation) for this scenario? Or did I miss any drawbacks?

Holger
  • 285,553
  • 42
  • 434
  • 765
Benjamin M
  • 23,599
  • 32
  • 121
  • 201

1 Answers1

2

You seem to be talking about the fetch size for an underlying database engine.

If you reduce it to one (fetching and processing one row at a time), yes you will save some space during the request time...

But it usually makes sense to have a reasonable chunk size. If it is too small you will have a lot of expensive network roundtrips. If the chunk size is too large, you are risking to run out of memory or introduce too much of a latency per fetch. So it is a compromise, and the right chunk/fetch size depends on your specific use case.

Regarding reactive approach or not, I believe it is not relevant. Like with RxJava and say Cassandra, one can create an Observable from an asynchronous result set, and it is up to the query (configuration) how many items should be fetched and pushed at a time.

yurgis
  • 4,017
  • 1
  • 13
  • 22
  • Yes, I guess both ways are effectively a **pull**. The `Stream` will pull one item at a time (which does 1 roundtrip per database entry). The `RxJava` would be some kind of buffered pull. It fetches e.g. 10 items, puts them into a queue/buffer and then pushes one item at a time to the `Oberserver`. *(I wonder if it's possible to keep the buffer filled all the time. I.e. request another 10 items when the buffer has only 5 items left)* `...` Implementing this kind of behavior using `java.util.stream` should be possible as well, but would require some serious work and time. – Benjamin M Jun 17 '16 at 23:45
  • In that case it all comes to how fast your producer (db) and consumer are relatively to each other. If the producer can keep up with the consumer then backpressure will definitely help if you decide to go with reactive approach. It will also likely help you minimize fetch size up to the point it won't keep up with the consumer. – yurgis Jun 17 '16 at 23:53
  • So.. correct me if I'm wrong. All in all I'd say, that `1.` using `java.util.Stream` is not the right solution, when response time is crucial, because at the end there is a `forEach(...)` which will cause to trigger a database call for every element. Could still be a nice solution for background tasks where response time doesn't matter. `2.` the reactive pattern has an integrated buffer/queue, which allows fetching multiple db entries at once. *And* one can still set the fetch size to `1` and get the same behavior as with the stream. `3.` fetch size = tradeoff between latency and memory usage – Benjamin M Jun 18 '16 at 00:03
  • #2. Yes, that's what I meant pretty much. With #1, I believe you can still craft a "smart" streaming data source that fetches multiple items at a time - e.g. a stream of lists where each list you flatMap futher down to individual items. – yurgis Jun 18 '16 at 00:09
  • Of course I could write that, but why should I spend time on that if there's already a well tested and working solution `;-)`. Anyway thank you for your help and clarifying the pros and cons!! – Benjamin M Jun 18 '16 at 00:28