I'm looking for a data structure that I can use for Snapshot
(a sequence) in the following example:
val oldSnapshot = Snapshot(3, 2, 1)
val newSnapshot = 4 +: (oldSnapshot dropRight 1) // basically a shift
// newSnapshot is now passed to another function that also knows the old snapshot
// this check should be fast if true
if (newSnapshot.tail == (oldSnapshot dropRight 1))
Background: I need an immutable data structure that stores a snapshot of the last n items that appeared in a stream. It is updated when a new item appears in the stream and the oldest item is dropped so the length is always at most n and the snapshots resemble a sliding window on the last n elements. In rare cases the stream can be interrupted and restarted. In case of a restart, the stream first emits at least n older elements before it continues to emit new "live" elements. However, some elements may be different, so I cannot be sure that a new snapshot of the recent history can be derived from an older snapshot just by appending new elements.
I further have a component that consumes a stream of these snapshots and does some incremental processing of the elements. It might for instance keep track of the sum of the elements. For a new snapshot it has to decide whether it was derived by appending one or a few elements to the end of the last known snapshot (and dropping the oldest elements) so it doesn't have to process all the old items again but can reuse some intermediate results.
By far the most common case is that the snapshot was shifted to include a single new element and the oldest was dropped. This case should be recognized really fast. If I would keep track of the whole history of elements and not drop the oldest, I could use a List
and just compare the tail
of the new list to the last known list. Internally, it would compare object identity and in most of the cases this would be enough to see that the lists are identical.
I'm thinking about using a Vector
or a similar data structure for the snapshots and I'm wondering if such comparisons would also be guaranteed to be efficient in this sense or whether there is perhaps a better suited data structure that internally uses object identity checks for subcollections to quickly determine wheter two instances are identical.