How to maintain an ordered sliding window

Question

This question is kind of irrelevant to a programming language. So here is the question:

I have a sliding window of size n which contains real numbers. This sliding window removes old values (FIFO way) per new insert if it is full. Every now and then (K times), I need the value in a specific index, index varies per query, of the ordered values inside this window. So I have to spend O(K*nlog(n)) to sort the window and O(1) to obtain my value. Would it be possible to reduce this complexity ? In the worst case scenario (that happens frequently), I have to sort the window for every new entry.

I was thinking of maintaining indices over the values of the sliding window and maintain a sorted list. That would save the sorting complexity, but what would be insertion or deletion complexity? Also, does a data structure like this already exist ?

Quickselect has linear (O(n)) average runtime complexity and sounds like exactly what you're looking for: https://en.wikipedia.org/wiki/Quickselect — Welbog, Jul 29 '19 at 17:32
I did not state it well enough. The "every now and then" is actually k. So even k*log(n) is pretty slow — Βασιλης Ιωσηφιδης, Jul 29 '19 at 17:35
If the question is [language-agnostic] then specific language tags are not appropriate. Tags edited. — John Bollinger, Jul 29 '19 at 17:38
@Welbog also I want to reduce the Big O, i would like to avoid "with high probability" — Βασιλης Ιωσηφιδης, Jul 29 '19 at 17:57
You can use a BBST to solve this. Keep track of the indices you want to insert / delete and do it so to the BBST. Both of these operations take up `O(log n)` time. You should be able to get the _i_'th smallest / largest element from the BBST in O(K) time. Assuming K is relatively small to N, this seems feasible. However, I feel like you can still improve on the lookup time but I'm not sure how. — Andrew Scott, Jul 29 '19 at 18:11
@ΒασιληςΙωσηφιδης, it would be O(k*n) using the Quickselect — Andrew Scott, Jul 29 '19 at 18:12

score 3 · Answer 1 · answered Jul 29 '19 at 22:44

If you maintain the window (of size n) elements in an order statistic tree (https://en.wikipedia.org/wiki/Order_statistic_tree), then it will cost you O(log n) time to advance the window and O(log n) time to find the i-th largest element in the window for any i. This will be advantageous if you have to do the query often.

An order statistic tree is just a balanced binary search tree wherein each node is augmented with the size of its subtree, which lets you drill down directly to the element at a given rank.

score 2 · Accepted Answer · answered Jul 29 '19 at 22:02

Efficiently selecting the i^th element

So I have to spend O(K*nlog(n)) to sort the window and O(1) to obtain my value. Would it be possible to reduce this complexity ?

Yes. By using QuickSelect instead of fully sorting, you could reduce the cost of K selections to O(Kn). If you add in the cost of m insertions and deletions at O(1) each, that's a total cost of O(m + Kn).

In the worst case scenario (that happens frequently), I have to sort the window for every new entry.

No, you should not have to do that, because you do not need to (fully) sort at all. See above. But if you did want to maintain the elements in sorted order then you could make use of the fact that the existing elements are already sorted, which reduces the cost for maintaining the list in order to no more than O(n) per element inserted, and possibly less (see below).

Maintaining a sorted index

I was thinking of maintaining indices over the values of the sliding window and maintain a sorted list. That would save the sorting complexity, but what would be insertion or deletion complexity?

It depends on the details. In all cases, insertions into and deletions from the FIFO can be made O(1). Here are some of the more likely alternatives for maintaining an index:

BST index

Suppose you maintain an index in the form of a red/black tree, or some other form of self-balancing binary search tree. Then insertions into and deletions from the index are both O(log n). Selection of the i^th element from such an index can be done in O(i), which is no worse than the O(n) of QuickSelect operating on unsorted data. For m insertions (and deletions) and K selections, that yields O(m log n + Kn). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Sorted, linear, random-access index

On the other hand, suppose you maintain a sorted, linear index of the current elements that supports random access. The random access provides for O(1) selections (or can do), but means that maintaining the index for each insertion and deletion costs O(n), mainly from moving elements around in the index. For m insertions (and deletions) and K selections, that yields O(mn + K). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Sorted, linear, sequential-access index

On the third hand, suppose you maintain a sorted, linear index of the current elements that requires sequential access, such as a linked list. Selections from that index cost O(n), as do insertions into it. It is possible to arrange for O(1) deletions in your case because you can know which node to delete without searching for it, but since deletions will always be paired with insertions once you have n elements, that doesn't really help you. For m insertions (and deletions) and K selections, that yields O(mn + Kn). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Also, does a data structure like this already exist ?

There's nothing really novel here. You just have a second data structure (or maybe a second view of the same data structure) that presents a different arrangement of the same data. The other data structure can be any of a multitude of kinds you already know.

Recommendation

None of the alternatives for maintaining a sorted index does asymptotically better than another, or than selecting at need with QuickSelect, in the expressed worst-case scenario of one selection per insertion. All are O(Kn) overall in that case. From that perspective, any of the above approaches is as good as another (and all should be asymptotic improvements).

But inasmuch as the better cases are apparently taken to be those with fewer selections, it is relevant when O(k) < O(m), using QuickSelect for selections asymptotically outperforms all variations on maintaining a sorted index that were considered. Fast insertions and deletions win the day here, and this is what I would go with based on the information available.

If there were cases where O(k) > O(m), then those would be better served by a random-access sorted index, on account of fast selections. The sequential-access index was always an also-ran, but it's interesting to me that in no case is the BST index a clear winner.