2

I have two sorted lists of unique items and I want to find their set difference and set intersection in a fast and cache-friendly way, such as with the C++ std::set_difference and std::set_intersection.

However, now I am working in Kotlin and I cannot find the corresponding functionality. Since the Kotlin standard library is build on top of the Java standard library, a Java answer is welcome.

I read through this and this great questions and all the answers, but as far as I see, they deal only with arbitrary sets, thus fortfeiting the sortedness.

The same goes for Guava.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Martin Drozdik
  • 12,742
  • 22
  • 81
  • 146
  • Removing the C++ tag as this isn't actually about C++ code. – NathanOliver Sep 13 '18 at 15:52
  • How about using a `TreeSet` to achieve sorting? – Phenomenal One Sep 13 '18 at 16:02
  • Are both sorted lists contain unique items? – Maxim Sep 13 '18 at 16:05
  • 1
    @MaruthiAdithya That is certainly possible, but it has worse complexity and worse cache locality than just using lists. Check out the C++ links I posted. – Martin Drozdik Sep 13 '18 at 16:06
  • @Maxim Yes, that is the case – Martin Drozdik Sep 13 '18 at 16:06
  • @MartinDrozdik May I know the datatype of items? – Phenomenal One Sep 13 '18 at 16:12
  • I don't think there is any built-in methods for this kind of operations in the Java library, not sure about Guava though. – Bubletan Sep 13 '18 at 16:12
  • 1
    Just a nit-pick: your assertion that "the Kotlin standard library is built on top of the Java standard library" is valid only for the Kotlin JVM variant. It's not true for Kotlin Native (or Kotlin JS). – DodgyCodeException Sep 13 '18 at 16:13
  • @MaruthiAdithya Let's say `Double`, but it shouldn't really matter. @DodgyCodeException Thanks! Didn't know that. – Martin Drozdik Sep 13 '18 at 16:15
  • 1
    Another minor point: in C++ you have true cache locality when the data is stored by value in each element. But in Java, if the type is `Double`, that's a wrapper object with the actual `double` data being scattered all over the heap. – DodgyCodeException Sep 13 '18 at 16:21
  • 1
    Very specific task. Think you should create methods by your own. C++ implementation is simple and use a kind of merge sort but without sorting. In Java there is `retainAll()` method to find intersection, but it complexity is O(n*m) as the fact that both arrays are sorted not used – Maxim Sep 13 '18 at 16:27

1 Answers1

3

Here the merge-intersect implementation which runs O(n+m) in worst-case scenario

static <T extends Comparable<T>> List<T> intersect(List<T> list1, List<T> list2) {
    final int size1 = list1.size();
    final int size2 = list2.size();
    final List<T> result = new ArrayList<>(Math.min(size1, size2));

    int i = 0;
    int j = 0;
    while (i < size1 && j < size2) {
        T a = list1.get(i);
        int compare = a.compareTo(list2.get(j));
        if (compare < 0)
            i++;
        else if (compare > 0)
            j++;
        else {
            result.add(a);
            i++;
            j++;
        }
    }

    return result;
}
Maxim
  • 1,194
  • 11
  • 24
  • 1
    Nice solution, but if the two lists contain millions of elements, but only two elements in common between them, then you'd be badly over-allocating the result list. You might want to do `if (result.capacity() > result.size() * 2) result.trimToSize();` or just let the ArrayList implementation do appropriate resizing by not using an initial capacity. – DodgyCodeException Sep 14 '18 at 12:41