Currently using Spark 2.2.0 structured streaming.
Given a stream of timestamped data with watermarking, is there a way to combine (1) the groupBy
operation to achieve windowing by the timestamp field and other grouping criteria with (2) the groupByKey
operation in order to apply mapGroupsWithState
to the groups for custom sessionization?
Or is it that I have to settle with somehow embedding the windowing and other grouping logic into groupByKey
?
For context:
calling
groupBy
, which supports windowing, on a Dataset returns a RelationalGroupedDataset which does not havemapGroupsWithState
.calling
groupByKey
, which supportsmapGroupsWithState
, returns a KeyValueGroupedDataset, but that has no support for windowing!
Edit:
The issue is now tracked by SPARK-21641 - Combining windowing (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming.