0

I get integers from the user (one by one) and insert into a sorted vec to its right place by running binary search and finding the insertion index. The problem is when user decides to provide a reversed sorted input (one by one) then insertion will be expensive, O(n^2), since on each insertion, all of the current elements in the vec has to be shifted to the right. Is there an algorithm that can handle this with less time?

Example:

[] <- 10
[10] <- 9 // Shift x1
[9, 10] <- 8 // Shift x2
[8, 9, 10] <- 7 // Shift x3
[7, 8, 9, 10] <- 6 // Shift x4
.
.
.
Renya Karasuma
  • 1,044
  • 4
  • 11
  • 18
  • 4
    Please, provide problem statement. Is it necessary to have sorted array after each user input? Should it be an array or another data structure may work? Depending on problem statement and constraints the answer may differ. – ivan_onys Feb 15 '21 at 13:11
  • 2
    To expand on that, if you only need a sorted array at the very end, then inserting normally then sorting is probably the best. If you just need a sorted iterator after every insertion, you should consider a B-Tree or a red/black search tree. – Aplet123 Feb 15 '21 at 13:37
  • A `BinaryHeap` might be slightly better than insert-in-random-order-then-sort when you only need the array fully ordered at the very end (use [`into_sorted_vec`](https://doc.rust-lang.org/std/collections/struct.BinaryHeap.html#method.into_sorted_vec)). It will also be better when you only need to keep track of the minimum or maximum value after each insertion. – trent Feb 15 '21 at 14:38
  • @trentcl `BinaryHeap` is useful if you need to query the min/max while runnning but is there any advantage over a regular sort if you only need the fully sorted collection at the end? – Masklinn Feb 15 '21 at 14:44
  • I want the vec to be sorted all the time (after every insertion). And I was trying to avoid B or red/black trees. – Renya Karasuma Feb 15 '21 at 15:00
  • @Masklinn Maybe. Both algorithms are technically O(*n* log *n*) but with heapsort you can do some of that work (an O(*n*) amount, admittedly) while you're building the heap, which might be IO-limited, whereas with `[T]::sort` you have to wait until you have all the data. The actual speed probably depends both on *n* and on the initial ordering of the sequence (I'm not sure the pathologies of `[T]::sort`, but it's based on a hybrid quicksort/heapsort, so it probably will be slightly worse than a straight heapsort in some cases even counting heapification). – trent Feb 15 '21 at 15:02
  • 1
    @trentcl `[T]::sort` is best when the slice is mostly sorted, and runs in linear time when the slice is sorted, or consists of sorted sections. `[T]::unstable_sort` is generally faster, and runs in linear time when the slice is in ascending order, all equal, descending order, or only has one element out of place. Both are O(n log n) in the worst case. – Aiden4 Feb 15 '21 at 19:21

1 Answers1

0

The problem is when user decides to provide a reversed sorted input (one by one) then insertion will be expensive, O(n^2), since on each insertion, all of the current elements in the vec has to be shifted to the right.

The Vec implementation will shift all the contents at once (using a memcpy) so shifting 20 items and shifting 1 doesn't really make any difference. If the collection is huge memory traffic will start being a concern but at low arities you can treat it as a constant.

Is there an algorithm that can handle this with less time?

An intrinsically sorted tree-based data structure. But the Rust standard library is somewhat limited on that front, and a BTreeSet will only work if you're deduplicating anyway. Not sure it will beat a regular Vec though, as it'll have a higher number of allocations.

And while a LinkedList theoretically provides O(1) insertion, Rust doesn't provide an insertion API because there's no Cursor, so you'd be paying O(n-i) to look for the insertion index, following which insert() would be paying that again to traverse to the index in question and insert the new item.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • Interesting what you are saying regarding shifting (that it's always the same cost). So how can you explain this: Sorted input: Very fast Random input: Moderate Reverse sorted input: Very slow – Renya Karasuma Feb 15 '21 at 14:59
  • Tiny nitpick- Vecs use the equivalent of `memmove` rather than `memcpy` because the source and destination overlap. – Aiden4 Feb 15 '21 at 20:58