1

In below example I'm attempting to return smallest 2 elements (nearest neighbours) for which "a" is a member of.

So smallest two elements for "a" based on :

List((("a","b"),1.0) , (("a","c"),4.0) , (("a","c"),3.0) , (("b","c"),2.0) )

is

 List((("a","b"),1.0) , (("a","c"),3.0)) 

Here is my solution :

    val l = List((("a","b"),1.0) , (("a","c"),4.0) , (("a","c"),3.0) , (("b","c"),2.0) )
                                                  //> l  : List[((String, String), Double)] = List(((a,b),1.0), ((a,c),4.0), ((a,c
                                                  //| ),3.0), ((b,c),2.0))
 val justA = l.filter(v => v._1._1.equals("a") || v._1._2.equals("a")).sortBy(_._2).take(2)
                                                  //> justA  : List[((String, String), Double)] = List(((a,b),1.0), ((a,c),3.0))

Is there a more efficient solution to this calculation ?

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

1

After the filter step, you can apply an algorithm that finds the k smallest remaining elements. This is similar to this question: Algorithm to find k smallest numbers in array of n items

And to recapitulate the answer I just posted to that question (because while looking up the algorithm I lost track of who had asked which question when):

It is possible to find the k smallest of n elements in O(n) time (by which I mean true O(n) time, not O(n + some function of k)). Refer here -- http://en.wikipedia.org/wiki/Selection_algorithm -- especially the subsections on "unordered partial sorting" and "median selection as pivot strategy", and also here -- http://en.wikipedia.org/wiki/Median_of_medians -- for the essential piece that makes this O(n).

Addendum: If you really only need to find the two smallest elements, ever, not k elements as described in the earlier part of this answer, there are two much simpler O(n) algorithms that you can apply after the filter step.

Algorithm 1. In one pass over the remaining data, find the smallest element. Remove it and set it aside. Then find the smallest element of the ones that remain. You now have the two smallest elements; time to find the first is O(n), time to find the second is O(n), and together this is still O(n).

Algorithm 2. Using a single-elimination tournament, find the smallest remaining element. This requires n-1 comparisons. The second-smallest element will be one of the approximately lg(n) elements (using the base-2 logarithm) that the smallest element was compared with, so next you find the smallest of those elements in O(log n) time. So this also is O(n), but takes fewer comparisons than Algorithm 1. In your case, however, the comparisons are very fast, so I'd probably just use Algorithm 1.

Community
  • 1
  • 1
David K
  • 3,147
  • 2
  • 13
  • 19
  • is the time for my implementation ".sortBy(_._2).take(2)" O(n log n) + O(N) since sortBy is a merge sort and take(2) requires traversing size of collection ? – blue-sky May 17 '14 at 12:08
  • Yes, if N is the number of items you started with and n is the number that are left after the filter. If you have no way to know in advance how many items the filter might remove, I'd call the worst-case performance O(N log N), because you know only that n <= N. – David K May 17 '14 at 12:56
  • Since k=2 in your question, my original answer actually is overly complicated. I've expanded the answer by adding some simpler algorithms for this particular problem. – David K May 17 '14 at 13:20