Let's think the windowing functions as a traditional GROUP BY operation which works on the time-based input data, applies a given aggregation function and outputs the result.
The key difference between a Tumbling Window (TW) operation and a Sliding Window (SW) one consists in the intersection set of the considered data points, which is empty in the former case and likely non-empty in the latter case.
A very good reading from Microsoft Azure Stream Analytics makes the difference with illustrations.
- TW, considering a tick of 10s such windowing operation outputs every tick the result from the aggregation function for such time frame;
- SW, considering a tick of Xs such windowing operation outputs every tick the result from the aggregation function for a Ys time frame whenever an event occurs, for instance
X = 1
and Y = 10
then every second the windowing function is looking back ten seconds discarding systematically the oldest data point.
Let's look at a concrete example for a the following time series:
t0-> 5 7 4 3 1 1 3 t10-> 4 5 8 1 2 3 3 3 5 7 7 t20-> t30-> 3 3 4 t40->
Considering SUM
as aggregation function and the banal strategy of SW which is the Hoping Window (HW):
- at
t0
SW = TW = 0
;
- at
t10
SW = TW = 24
;
- at
t11
there's no TW but SW = 23
and the intersection among successive windows is 7 4 3 1 1 3
;
- at
t11
there's no TW but SW = 21
and the intersection among successive windows is 4 3 1 1 3 4
;
- at
t20
TW = SW = 48
;
- at
t21
there's no TW but SW = 44
and the intersection among successive windows is 5 8 1 2 3 3 3 5 7 7
- at
t30
TW = SW = 0
;
- at
t31
there's no TW but SW = 3
and the intersection among successive windows is empty as no events occurred in [t20, t30]
.
Another good reading by SoftwareMill CTO Adam Warski which exemplifies using modern streaming technologies like Spark, Flink, Akka and Kafka.