2

This question is kind of irrelevant to a programming language. So here is the question:

I have a sliding window of size n which contains real numbers. This sliding window removes old values (FIFO way) per new insert if it is full. Every now and then (K times), I need the value in a specific index, index varies per query, of the ordered values inside this window. So I have to spend O(K*nlog(n)) to sort the window and O(1) to obtain my value. Would it be possible to reduce this complexity ? In the worst case scenario (that happens frequently), I have to sort the window for every new entry.

I was thinking of maintaining indices over the values of the sliding window and maintain a sorted list. That would save the sorting complexity, but what would be insertion or deletion complexity? Also, does a data structure like this already exist ?

2 Answers2

3

If you maintain the window (of size n) elements in an order statistic tree (https://en.wikipedia.org/wiki/Order_statistic_tree), then it will cost you O(log n) time to advance the window and O(log n) time to find the i-th largest element in the window for any i. This will be advantageous if you have to do the query often.

An order statistic tree is just a balanced binary search tree wherein each node is augmented with the size of its subtree, which lets you drill down directly to the element at a given rank.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
2

Efficiently selecting the ith element

So I have to spend O(K*nlog(n)) to sort the window and O(1) to obtain my value. Would it be possible to reduce this complexity ?

Yes. By using QuickSelect instead of fully sorting, you could reduce the cost of K selections to O(Kn). If you add in the cost of m insertions and deletions at O(1) each, that's a total cost of O(m + Kn).

In the worst case scenario (that happens frequently), I have to sort the window for every new entry.

No, you should not have to do that, because you do not need to (fully) sort at all. See above. But if you did want to maintain the elements in sorted order then you could make use of the fact that the existing elements are already sorted, which reduces the cost for maintaining the list in order to no more than O(n) per element inserted, and possibly less (see below).

Maintaining a sorted index

I was thinking of maintaining indices over the values of the sliding window and maintain a sorted list. That would save the sorting complexity, but what would be insertion or deletion complexity?

It depends on the details. In all cases, insertions into and deletions from the FIFO can be made O(1). Here are some of the more likely alternatives for maintaining an index:

BST index

Suppose you maintain an index in the form of a red/black tree, or some other form of self-balancing binary search tree. Then insertions into and deletions from the index are both O(log n). Selection of the ith element from such an index can be done in O(i), which is no worse than the O(n) of QuickSelect operating on unsorted data. For m insertions (and deletions) and K selections, that yields O(m log n + Kn). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Sorted, linear, random-access index

On the other hand, suppose you maintain a sorted, linear index of the current elements that supports random access. The random access provides for O(1) selections (or can do), but means that maintaining the index for each insertion and deletion costs O(n), mainly from moving elements around in the index. For m insertions (and deletions) and K selections, that yields O(mn + K). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Sorted, linear, sequential-access index

On the third hand, suppose you maintain a sorted, linear index of the current elements that requires sequential access, such as a linked list. Selections from that index cost O(n), as do insertions into it. It is possible to arrange for O(1) deletions in your case because you can know which node to delete without searching for it, but since deletions will always be paired with insertions once you have n elements, that doesn't really help you. For m insertions (and deletions) and K selections, that yields O(mn + Kn). In the event that O(m) = O(k) -- the specified worst case -- that's O(Kn) overall.

Also, does a data structure like this already exist ?

There's nothing really novel here. You just have a second data structure (or maybe a second view of the same data structure) that presents a different arrangement of the same data. The other data structure can be any of a multitude of kinds you already know.

Recommendation

None of the alternatives for maintaining a sorted index does asymptotically better than another, or than selecting at need with QuickSelect, in the expressed worst-case scenario of one selection per insertion. All are O(Kn) overall in that case. From that perspective, any of the above approaches is as good as another (and all should be asymptotic improvements).

But inasmuch as the better cases are apparently taken to be those with fewer selections, it is relevant when O(k) < O(m), using QuickSelect for selections asymptotically outperforms all variations on maintaining a sorted index that were considered. Fast insertions and deletions win the day here, and this is what I would go with based on the information available.

If there were cases where O(k) > O(m), then those would be better served by a random-access sorted index, on account of fast selections. The sequential-access index was always an also-ran, but it's interesting to me that in no case is the BST index a clear winner.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157