1

Did not manage to google the name of this problem, hope this question will contribute to community.

Consider we have two sorted arrays of numbers, like:

 2  8
12 18
45 35
85 48
87 49
97 59

And we want to efficiently take first k (10) smallest sum combinations of numbers from both arrays. In our case that would be:

 2 +  8 = 10 
 2 + 18 = 20
12 +  8 = 20
12 + 18 = 30
 2 + 35 = 37
12 + 35 = 47
 2 + 48 = 50
 2 + 49 = 51
45 +  8 = 53
12 + 48 = 60

What would be the right approach? I scratched a naive implementation (improved by @sanyash), but it does not get use of the fact that arrays are sorted and the problem feels doable in a linear time...

def smallest_product(k, arr1, arr2):
    product_iter = itertools.product(
        itertools.islice(arr1, k),
        itertools.islice(arr2, k),
    )
    product_sorted = sorted(product_iter, key=sum)
    product_sliced = itertools.islice(product_sorted, k);
    return list(product_sliced)

print(smallest_product(10, 
    [ 2, 12, 45, 85, 87, 98], 
    [ 8, 18, 35, 48, 49, 59]))

Similar question: efficient sorted Cartesian product of 2 sorted array of integers (but it deals with creating a full resulting array, whereas in my case I need just the first few values)

P.S. I added the python tag as it's a math problem, but I'll be happy with solution in any language or just an explanation, or a link to wikipedia...

Klesun
  • 12,280
  • 5
  • 59
  • 52
  • Possible duplicate of [How to get the K smallest Products from pairs from two sorted Arrays?](https://stackoverflow.com/questions/56382653/how-to-get-the-k-smallest-products-from-pairs-from-two-sorted-arrays) – Klesun Oct 23 '19 at 19:08

4 Answers4

2

Imagine we create a table using our two arrays:

for arr in [[i + j for j in arr2] for i in arr1]: print(arr)

We get an output like this:

[10, 20, 37, 50, 51, 61]
[20, 30, 47, 60, 61, 71]
[53, 63, 80, 93, 94, 104]
[93, 103, 120, 133, 134, 144]
[95, 105, 122, 135, 136, 146]
[106, 116, 133, 146, 147, 157]

Note that in this matrix, matrix[i][j] == arr1[i] + arr2[j]. So we can compute the value of an element in any position of the matrix in O(1). Notice that this is a sorted matrix, where all rows and columns are monotonically increasing, and we are trying to find the k smallest elements inside of it.

At this stage an O(KlogN) heap approach becomes fairly straightforward. Take the first row and turn it into a min heap. Each time pop the smallest element and add it to your result. Every single time you pop you add the element in the corresponding column for the next row into the heap. Repeat k times and you have your k smallest sums.

This isn't fully relevant to the situation, however there does exist variants of the saddleback search that let you find the kth smallest element in a sorted matrix in O(N) instead of O(KlogN) as with the approach above. There probably is a way to modify the approach taken in this paper to O(K) but it is most likely overkill for this situation.

Primusa
  • 13,136
  • 3
  • 33
  • 53
2

You could use a heap:

import heapq


def smallest_product(k, a, b):
    k = min(k, len(a) * len(b))
    l = [(a[0] + b[0], 0, 0)]
    heapq.heapify(l)

    seen = set()
    for _ in range(k):
        s, i, j = heapq.heappop(l)

        if i + 1 < len(a) and (i + 1, j) not in seen:
            heapq.heappush(l, (a[i + 1] + b[j], i + 1, j))
            seen.add((i + 1, j))
        if j + 1 < len(b) and (i, j + 1) not in seen:
            heapq.heappush(l, (a[i] + b[j + 1], i, j + 1))
            seen.add((i, j + 1))
        yield (a[i], b[j])

result = list(smallest_product(10, [ 2, 12, 45, 85, 87, 98], [ 8, 18, 35, 48, 49, 59]))

print(result)

Output

[(2, 8), (2, 18), (12, 8), (12, 18), (2, 35), (12, 35), (2, 48), (2, 49), (45, 8), (12, 48)]

The above code is a Python translation from the one in here. This method has a time complexity of O(k*log k).

Output (For k = 11)

[(2, 8), (2, 18), (12, 8), (12, 18), (2, 35), (12, 35), (2, 48), (2, 49), (45, 8), (12, 48), (2, 59)]
Klesun
  • 12,280
  • 5
  • 59
  • 52
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Thanks for the solution! I can't get why from glance but for some reason it fails for `k` values greater than `10` (like `11`). `IndexError: list index out of range` `heapq.heappush(l, (a[i] + b[j + 1], i, j + 1))` – Klesun Oct 24 '19 at 13:15
  • 1
    @ArturKlesun See now. – Dani Mesejo Oct 24 '19 at 13:29
0

Truncate both arrays to len k first. After that use your previous implementation. It will be O(k^2) difficulty:

import itertools

def smallest_product(k, arr1, arr2):
    product_iter = itertools.product(
        itertools.islice(arr1, k),
        itertools.islice(arr2, k),
    )
    product_sorted = sorted(product_iter, key=sum)[:k]
    return list(product_sorted)


print(smallest_product(
    10, 
    [ 2, 12, 45, 85, 87, 98], 
    [ 8, 18, 35, 48, 49, 59])
)
sanyassh
  • 8,100
  • 13
  • 36
  • 70
  • Thanks, that'll will definitely be an upgrade, though I wonder if there is a way to achieve better results than `O(k^2)`... I'll update my question. – Klesun Oct 23 '19 at 18:57
0

This problem can be solved in three steps.

  1. Construct a length-m list of iterables of sorted pairs, where m = min(len(list1), k);
  2. Apply m-way merging (see this for more) to the iterables to get a single iterable of sorted pairs, using the sum of each pair as the key;
  3. Take the first k elements from the iterable.

There are different algorithms for m-way merging. Following is a heap-based implementation. The complexity is O(k*logm).

from itertools import islice
from heapq import merge

def smallest_pairs(k, list1, list2):
    pairs = map(lambda x:((x, y) for y in list2), islice(list1, k))
    return list(islice(merge(*pairs, key=sum), k))

print(smallest_pairs(10, 
      [ 2, 12, 45, 85, 87, 98],
      [ 8, 18, 35, 48, 49, 59]))
GZ0
  • 4,055
  • 1
  • 10
  • 21