Ordered cartesian product of arrays

Question

In efficient sorted Cartesian product of 2 sorted array of integers a lazy algorithm is suggested to generate ordered cartesian products for two sorted integer arrays.

I curious to know if there is a generalisation of this algorithm to more arrays.

For example say we have 5 sorted arrays of doubles

(0.7, 0.2, 0.1)

(0.6, 0.3, 0.1)

(0.5, 0.25, 0.25)

(0.4, 0.35, 0.25)

(0.35, 0.35, 0.3)

I am interested in generating the ordered cartesian product without having to calculate all possible combinations.

Appreciate any ideas on how a possible lazy cartesian product algorithm would possibly extend to dimensions beyond 2.

Let's suppose you have two n-dimensional points: A(A1, ..., An) and B(B1, ..., Bn). How do you compare them? When is A < B? — Lajos Arpad, Aug 19 '14 at 20:06
If A=0.7*0.6*0.5*0.4*0.3=0.0252 and B=0.7*0.6*0.5*0.35*0.35=0.025725 then A — mrod, Aug 20 '14 at 18:45
Where are the numbers at 0.7*0.6*0.5*0.4*0.3 and 0.7*0.6*0.5*0.35*0.35 coming from the coordinates of A and B respectively? Thanks for the example, but I am so severe in my expectation to be sure what a task is about that I do not even try to think if the problem is not 100% specified. — Lajos Arpad, Aug 20 '14 at 18:53
Lets assume x1=(0.7, 0.2, 0.1) x2=(0.6, 0.3, 0.1) x3=(0.5, 0.25, 0.25) x4=(0.4, 0.35, 0.25) x5=(0.35, 0.35, 0.3). Then A=x1(0)*x2(0)*x3(0)*x4(0)*x5(2) and B=x1(0)*x2(0)*x3(0)*x4(1)*x5(0) where the number in the parenthesis refer to the index in the x array starting from zero. — mrod, Aug 20 '14 at 19:30
Now the problem is understood. I am thinking about a solution. — Lajos Arpad, Aug 21 '14 at 17:41

score 2 · Answer 1 · answered Jan 28 '16 at 13:09

This problem appears to be an enumeration instance of uniform-cost-search (see for ex. https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm ). Your state-space is defined by the set of current indexes pointing to your sorted arrays. The successor function is an enumeration of possible index increments for every array. For your given example of 5 arrays, the initial state is (0, 0, 0, 0, 0).

There is no goal state check function as we need to go through all possibilities. The result is guaranteed to be sorted if all the input arrays are sorted.

Assuming we have m arrays of length n each, then the complexity of this method is O((n^m).log(n(m-1)).

Here is a sample implementation in python:

from heapq import heappush, heappop

def cost(s, lists):
    prod = 1
    for ith, x in zip(s, lists):
        prod *= x[ith]
    return prod

def successor(s, lists):
    successors = []
    for k, (i, x) in enumerate(zip(s, lists)):
        if i < len(x) - 1: 
            t = list(s)
            t[k] += 1
            successors.append(tuple(t))
    return successors

def sorted_product(initial_state, lists):    
    fringe = []
    explored = set()
    heappush(fringe, (-cost(initial_state, lists), initial_state))
    while fringe:
        best = heappop(fringe)[1]
        yield best
        for s in successor(best, lists):
            if s not in explored:
                heappush(fringe, (-cost(s, lists), s))
                explored.add(s)

if __name__ == '__main__':
    lists = ((0.7, 0.2, 0.1),
             (0.6, 0.3, 0.1),
             (0.5, 0.25, 0.25),
             (0.4, 0.35, 0.25),
             (0.35, 0.35, 0.3))
    init_state = tuple([0]*len(lists))
    for s in sorted_product(init_state, lists):
        s_output = [x[i] for i, x in zip(s, lists)]
        v = cost(s, lists)
        print '%s %s \t%s' % (s, s_output, cost(s, lists))

Thanks, this has helped me enormously with a natural language problem I'm solving in Java. — Sam, Jul 26 '16 at 20:23

score 0 · Answer 2 · answered Aug 21 '14 at 18:44

So, if you have A(A1, ..., An) and B(B1, ..., Bn).

A < B if and only if

A1 * ... * An < B1 * ... * Bn

I'm assuming that every value is positive, because if we allow negatives, then:

(-50, -100, 1) > (1, 2, 3)

as -50 * (-100) * 1 = 5000 > 6 = 1 * 2 * 3

Even without negative values, the problem is still rather complex. You need a solution which would include a data structure, which would have a depth of k. If (A1, ..., Ak) < (B1, ..., Bk), then we can assume that on other dimensions, a combination of (A1, ..., Ak, ... An) is probably smaller than a combination of (B1, ..., Bk, ..., Bn). As a result, wherever this is not true, the case beats the probability, so those would be the exceptions of the rule. The data-structure should hold:

k
the first k elements of A and B respectively
description of the exceptions from the rule

For any of such exceptions, there might be a combination of (C1, ..., Ck) which is bigger than (B1, ..., Bk), but the bigger combination of (C1, ..., Ck) might still have combinations using values of further dimensions where exceptions of the rule of (A1, ..., Ak) < (C1, ..., Ck) might be still present.

So, if you already know that (A1, ..., Ak) < (B1, ..., Bk), then first you have to check whether there are exceptions by finding the first l dimensions where upon choosing the biggest possible values for A and the smallest possible values for B. If such l exists, then you should find where the exception starts (which dimension, which index). This would describe the exception. When you find an exception, you know that the combination of (A1, ..., Ak, ..., Al) > (B1, ..., Bk, ..., Bl), so here the rule is that A is bigger than B and an exception would be present when B becomes bigger than A.

To reflect this, the data-structure would look like:

class Rule {
    int k;
    int[] smallerCombinationIndexes;
    int[] biggerCombinationIndexes;
    List<Rule> exceptions;
}

Whenever you find an exception to a rule, the exception would be generated based on prior knowledge. Needless to say that the complexity greatly increases, but problem is that you have exceptions for the rules, exceptions for the exceptions and so on. The current approach would tell you that if you take two random points, A and B, whether A is smaller than B and it would also tell you that if you take combinations of (A1, ..., Ak) and (B1, ..., Bk), then what is the key indexes where the result of the comparison of (A1, ..., Ak) and (B1, ..., Bk) would change. Depending on your exact needs this idea might be enough or could need extensions. So the answer to your question is: yes, you can extend the lazy algorithm to handle further dimensions, but you need to handle the exceptions of the rules to achieve that.

Thank you @LajosArpad for this reply. Your assumption is correct that the numbers are always positive. — mrod, Aug 21 '14 at 18:58

Ordered cartesian product of arrays

2 Answers2