3

Assume an application producing a number of HashMap<String, MyClass> data structures, each containing dozens to hundreds of Comparable objects of type MyClass, which need to end up in a single and sorted Collection.

Two possible implementations of this functionality return a SortedSet or a sorted List, as follows:

public static Set<MyClass> getSortedSet(HashMap<String, MyClass>... allMaps)
{
    SortedSet<MyClass> set = new TreeSet<MyClass>();

    Collection<MyClass> c;

    for (HashMap<String, MyClass> map:allMaps)
    {
        c = map.values();
        set.addAll(c);
    }

    return set;
}

public static List<MyClass> getSortedList(HashMap<String, MyClass>... allMaps)
{
    List<MyClass> list = new ArrayList<MyClass>();

    Collection<MyClass> c;

    for (HashMap<String, MyClass> map:allMaps)
    {
        c = map.values();
        list.addAll(c);
    }

    Collections.sort(list);

    return list;
}

Would there be any clear performance advantage to any of the above 2 methods?

Is there a faster way of implementing the same functionality?

PNS
  • 19,295
  • 32
  • 96
  • 143
  • 5
    If you're wondering which is faster, why not measure them on your actual data? – NPE May 14 '12 at 13:48
  • Because somebody else is going to use that code! All I am asking is whether there is a profound reason for one implementation or the other to be faster! – PNS May 14 '12 at 13:50
  • You can still always do a load test to know the better performance... – Sridhar G May 14 '12 at 14:12

3 Answers3

4

Some problems with your sorted list method:

ArrayLists are backed by arrays. Whenever you add a new element, it might have to grow the array behind the scenes. If you want to use this approach, you should create the ArrayList of the proper size before hand.

The sorting after you add all elements seems non-optimal. Why not add elements into the list at their correct position? (Use a sorted collection then turn into a list) A good Sorted List for Java

To actually answer you question, I would go with the approach that uses the TreeSet behind the scenes. Because, if the user wants, they can always do Set.toArray() and then have a list.

Community
  • 1
  • 1
Colin D
  • 5,641
  • 1
  • 23
  • 35
  • This seems to be in line with http://stackoverflow.com/questions/6971152/list-with-comparable-vs-treeset. A generic solution for converting from Set to List can be found at http://stackoverflow.com/questions/740299/how-do-i-sort-a-set-to-a-list-in-java. Thanks! – PNS May 14 '12 at 14:08
2

Sets and Lists differ by their nature in the sense, Sets do not keep duplicates. You can have only one instance of an object. Lists allow you to keep duplicates.

As such, Sets do more work, hence they are slower.

srini.venigalla
  • 5,137
  • 1
  • 18
  • 29
  • Sure, but let's say there are no duplicates. Is there any reason to assume that one implementation will be always faster than the other? – PNS May 14 '12 at 13:52
  • Maybe it's not about duplicates being present or not, but about the instructions to check that an element isn't already present in the set. So the answer still applies. – Carlo May 14 '12 at 13:54
  • 1
    @PNS see Colin D answer below. Using a TreeSet and let the user derive an array when needed is a good approach. – srini.venigalla May 14 '12 at 14:04
  • His answer was placed above, but got it. Thanks! – PNS May 14 '12 at 15:25
2

Is there any reason to assume that one implementation will be always faster than the other?

No, there is no reason to assume that.

Which is faster could depend on the quantity of data, on its properties, on the performance characteristics of the comparator, on your JDK, on the JIT compiler etc.

The only way to know for sure is by benchmarking the code on realistic data.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Someone has pointed to http://stackoverflow.com/questions/6971152/list-with-comparable-vs-treeset but then removed the comment. It does seem that TreeSet is indeed faster, though. – PNS May 14 '12 at 14:07
  • @PNS: That's a different problem (they sort the list after each insertion, whereas you only sort at the end). – NPE May 14 '12 at 14:22