6

Am a newbie in Spark and I have this prototype application based on Spark Structured Streaming wherein am continuously reading stream data from Kafka.

On this stream data lets say I have to apply multiple aggregations:

1) group by key1 and generate sum and count
2)group by key1 and key2 and generate count and so on...

If I create the above 2 aggregations as streaming queries , two independent streaming queries are created each reading from kafka independently which is not what I want. Caching the data from kafka and then perform multiple aggregations doesn't seems to be working in Structured streaming.

What is the best way to do multiple aggregation on streaming data ?

Some post suggests flatmapwithGroupState might work for such use case but I can't find any examples for same

dipu20
  • 61
  • 2

0 Answers0