Am a newbie in Spark and I have this prototype application based on Spark Structured Streaming wherein am continuously reading stream data from Kafka.
On this stream data lets say I have to apply multiple aggregations:
1) group by key1 and generate sum and count
2)group by key1 and key2 and generate count
and so on...
If I create the above 2 aggregations as streaming queries , two independent streaming queries are created each reading from kafka independently which is not what I want. Caching the data from kafka and then perform multiple aggregations doesn't seems to be working in Structured streaming.
What is the best way to do multiple aggregation on streaming data ?
Some post suggests flatmapwithGroupState might work for such use case but I can't find any examples for same