I'm looking for the best solution to accumulate the last N number of messages in a Spark DStream. I'd also like to specify the number of messages to retain.
For example, given the following stream, I'd like to retain the last 3 elements:
Iteration New message Downstream
1 A [A]
2 B [A, B]
3 C [A, B, C]
4 D [B, C, D]
So far I'm looking at the following methods on DStream:
- updateStateByKey: given that all messages have the same key I can do this. But looks a bit odd why this needs to know about the key at all.
- mapWithState: the API in Scala is just too tedious for such a simple thing
- window: doesn't seem to do this job, also it needs a time value for windowing instead of the last number of elements
- Accumulators: not really used yet Accumulators in Spark docs
What's the best solution to achieve this?