24

This is a continuation of questions like this one.

Are there any guidelines for tweaking the performance? I don't mean gains in big-O, just saving some linear time.

For example, how much does pre-sorting save on either SortedList or SortedDictionary?

Say I have a person-class with 3 properties to sort on, one of them is age in years. Should I bucket the objects on age first?

Should I first sort on one property, then use the resulting list/dictionary to sort on two properties and so on?

Any other optimizations that spring to mind?

Community
  • 1
  • 1
Martin
  • 2,956
  • 7
  • 30
  • 59
  • 1
    Have you tried profiling your code to make sure that initializing your sorted datasctructures is in fact the bottleneck in your code? – Mark Byers Jan 10 '10 at 11:56
  • 2
    So far it's a hypothetical question, but yes, this will be the bottleneck, by far. – Martin Jan 10 '10 at 15:32
  • I can't remember but I suppose I was assuming that all methods were asymptotically equal in performance and but maybe differed in average (O(1)) performance depending on the use case. – Martin Feb 03 '14 at 14:14
  • The question you linked has the documentation part which clarifies the effect of pre-sorting. – nawfal May 21 '14 at 19:49

1 Answers1

61

Well, it's an easy win on SortedList. Inserting an item requires a binary search (O(log(n)) to find the insertion point, then a List.Insert (O(n)) to insert the item. The Insert() dominates, populating the list requires O(n^2). If the input items are already sorted then the Insert collapses to O(1) but doesn't affect the search. Populating is now O(nlog(n)). You don't worry how big the Oh is, sorting first is always more efficient. Assuming you can afford the doubled storage requirement.

SortedDictionary is different, it uses a red-black tree. Finding the insertion point requires O(log(n)). Rebalancing the tree might be required afterwards, that also takes O(log(n)). Populating the dictionary thus takes O(nlog(n)). Using sorted input does not change the effort to find the insertion point or rebalancing, it is still O(nlog(n)). Now the Oh matters though, inserting sorted input requires the tree to constant rebalance itself. It works better if the input is random, you don't want sorted input.

So populating SortedList with sorted input and populating SortedDictionary with unsorted input is both O(nlog(n)). Ignoring the cost of providing sorted input, the Oh of SortedList is smaller than the Oh of SortedDictionary. That's an implementation detail due to the way List allocates memory. It only has to do so O(log(n)) times, a red-black tree has to allocate O(n) times. Very small Oh btw.

Notable is that neither one compares favorably over simply populating a List, then calling Sort(). That's also O(nlog(n)). In fact, if input is already accidentally sorted you can bypass the Sort() call, this collapses to O(n). The cost analysis now needs to move to the effort it takes to get the input sorted. It is hard to bypass the fundamental complexity of Sort(), O(nlog(n)). It might not be readily visible, you might get the input sorted by, say, a SQL query. It will just take longer to complete.

The point of using either SortedList or SortedDictonary is to keep the collection sorted after inserts. If you only worry about populating but not mutating then you shouldn't use those collections.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 5
    Sidenote: If the data can be sorted using a non-comparative method such as Radix Sort then sorting can be pseudo-linear which (depending on the length of the "radix" compared to the input) collapses to O(n) time to sort even for unsorted inputs, in which case making a list and using Sort() may be quicker. – apokryfos Mar 23 '12 at 11:04
  • (sidenote#2) "The point of using [...] SortedDictonary is to keep the collection sorted after inserts" that's true. However, I've come across another corner-case scenario in which SortedDictionary >> 'create List + .Sort()': I had to use a temp sorted-data-structure to perform validation against a hefty data-set. A SortedDictionary built node-by-node performed the validation in each insertion. It could thus fail-fast while allocating less memory. A list would allocate the entire array before calling sort() thus straining memory a bit (if memory is abundant then the list is better of course) – XDS Apr 20 '20 at 15:35