5

I have an unsorted Collection of objects [that are comparable], is it possible to get a sub list of the collection of the list without having to call sort?

I was looking at the possibility of doing a SortedList with a limited capacity, but that didn't look like the right option.

I could easily write this, but I was wondering if there was another way.

I am not able to modify the existing collection's structure.

monksy
  • 14,156
  • 17
  • 75
  • 124

4 Answers4

5

Since you don't want to call sort(), it seems like you are trying to avoid an O(n log(n)) runtime cost. There is actually a way to do that in O(n) time -- you can use a selection algorithm.

There are methods to do this in the Guava libraries (Google's core Java libraries); look in Ordering and check out:

These are implementations of quickselect, and since they're written generically, you could just call them on your Set and get a list of the k smallest things. If you don't want to use the entire Guava libraries, the docs link to the source code, and I think it should be straightforward to port the methods to your project.

If you don't want to deviate too far from the standard libraries, you can always use a sorted set like TreeSet, though this gets you logarithmic insert/remove time instead of the nice O(1) performance of the hash-based Set, and it ends up being O(n log(n)) in the end. Others have mentioned using heaps. This will also get you O(n log(n)) running time, unless you use some of the fancier heap variants. There's a fibonacci heap implementation in GraphMaker if you're looking for one of those.

Which of these makes sense really depends on your project, but I think that covers most of the options.

Todd Gamblin
  • 58,354
  • 15
  • 89
  • 96
1

I would probably create a sorted set. Insert the first N items from your unsorted collection into your sorted set. Then for the remainder of your unsorted collection:

  1. insert each item in the sorted set
  2. delete the largest item from the sorted set
  3. Repeat until you've processed all items in the unsorted collection
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Thats not a bad solution, since it still is considered to be n operations on my part. [Yes, there is sorting on the SortedSet's part] – monksy Mar 29 '11 at 05:08
  • This only works if the initial collection is a Set. If it is a list, then the sorted set will discard duplicates ... – Stephen C Mar 29 '11 at 05:20
  • Well you could use a SortedBag to solve that: http://bit.ly/e2mave. But there are also less expensive options than sorting the whole data set. – Todd Gamblin Mar 29 '11 at 05:48
  • If you want to go this way, a PriorityQueue would be much more efficient. – Kevin Bourrillion Mar 29 '11 at 07:59
  • @Kevin: It might be, but of course that's not really guaranteed, and even at best "much" more efficient is unlikely [e.g., insertion into either a priority queue or a tree-based set will be O(log N)]. – Jerry Coffin Mar 29 '11 at 17:14
  • @Stephen C: It is true that I treated it as "N smallest unique values". Obviously, if you want to allow duplicates, you use a multiset instead of a set. – Jerry Coffin Mar 29 '11 at 17:15
  • 1
    It appears that if you want the results in sorted order, then yeah, sometimes a SortedSet can beat a PQ. My mistake. However, Ordering.leastOf() (the current top-rated answer) is 3-5x faster than either of these anyway. – Kevin Bourrillion Mar 29 '11 at 20:17
  • @Kevin: It's nice to have actual numbers from the lead Guava developer. – Todd Gamblin Mar 29 '11 at 20:39
  • @Kevin: yes -- and if it was built into the standard library, I'd have no problem with recommending it (e.g., in C++, which does include it, I've done so). http://stackoverflow.com/questions/5380568/algorithm-to-find-k-smallest-numbers-in-array-of-n-items/5380672#5380672. Its omission from Java's standard library strikes me as unfortunate, but such is life. – Jerry Coffin Mar 29 '11 at 21:53
1

Yes, you can put all of them into a max heap data structure with a fixed size of N, conditionally, if the item is smaller than the largest in the max heap (by checking with the get() "peek" method). Once you have done so they will, by definition, be the N smallest. Optimal implementations will perform with O(M)+lg(N) or O(M) (where M is the size of the set) performance, which is theoretically fastest. Here's some pseudocode:

MaxHeap maxHeap = new MaxHeap(N);
for (Item x : mySetOfItems) {
  if (x < maxHeap.get()) {
    maxHeap.add(x);
  }
}

The Apache Commons Collections class PriorityBuffer seems to be their flagship binary heap data structure, try using that one.

maerics
  • 151,642
  • 46
  • 269
  • 291
0

http://en.wikipedia.org/wiki/Heap_%28data_structure%29

don't you just want to make a heap?

dting
  • 38,604
  • 10
  • 95
  • 114