I have a very large relational database dataset which I would like to index in elastic search. The query which retrieves the data consists of multiple joins and all other SQL goodies. The data is grouped/processed (in-memory) in order to create meaningful json representation and bulk update is created from the results and send to elastic search with elastic4s
scala client.
I would like to introduce streaming to this process as both slick
and elastic
support it.
The problem I have is that the in-memory grouping and converting to json makes only sense if all the results (for given relation) are loaded into memory (due to several joins/left joins, I need to group by id and map the results in memory). How would it be handled with streaming?