Find a list of unique representatives (elements) from a list of arrays

Question

I have a list(or array) consist of n arrays. Each array carries an arbitrary subset of integers from 0 to n-1 (numbers are not repeated within an array). An example for n=3 is:

l = [np.array([0, 1]), np.array([0]), np.array([1, 2])]

I want to pick one single number from each array as its representative, such that no two arrays have the same representative and make a list of them in the same order as arrays. In other words, the numbers picked for arrays must be unique and the whole set of representatives, will be a permutation of numbers 0 to n-1. For above list, it would uniquely be:

representatives = [1, 0, 2]

There is a guarantee that such list of representatives exist for our list, but how do we find them. In case, there are more than one possible list of representatives, any one of them can be randomly selected.

Id be surprised if this is not NP-complete. Initial thougt is to create a reduction from maximum Independent Set. — Christian Sloper, Jun 25 '20 at 08:46
@ChristianSloper Thank you. Tried googling it. Could not find a good source for this case. Could you please point me to the right direction? — Ehsan, Jun 25 '20 at 08:52
@ChristianSloper here is a small surprise for you ;-) https://en.wikipedia.org/wiki/Matching_(graph_theory) — Paul Panzer, Jun 25 '20 at 23:08
@PaulPanzer While that is interesting to me, I think Christian meant this: https://en.wikipedia.org/wiki/Independent_set_(graph_theory) which seems to be NP-complete. Not sure how to apply that to this problem though. Thank you. — Ehsan, Jun 26 '20 at 00:12

Paul Panzer · Accepted Answer · 2020-06-25T22:59:44.987

What you are asking for is a maximum matching for the bipartite graph whose left and right sets are indexed by your arrays and their unique elements, respectively.

The networkx module knows how to find such a maximum matching:

import numpy as np
import networkx as nx
import operator as op

def make_example(n,density=0.1):
    rng = np.random.default_rng()
    M = np.unique(np.concatenate([rng.integers(0,n,(int(n*n*density),2)),
                                  np.stack([np.arange(n),rng.permutation(n)],
                                           axis=1)],axis=0),axis=0)
    return np.split(M[:,1],(M[:-1,0] != M[1:,0]).nonzero()[0])

def find_matching(M):
    G = nx.Graph()
    m = len(M)
    n = 1+max(map(max,M))
    G.add_nodes_from(range(n,m+n), biparite=0)
    G.add_nodes_from(range(n),biparite=1)
    G.add_edges_from((i,j) for i,r in enumerate(M,n) for j in r)
    return op.itemgetter(*range(n,m+n))(nx.bipartite.maximum_matching(G))

Example:

>>> M = make_example(10,0.4)
>>> M
[array([0, 4, 8]), array([9, 3, 5]), array([7, 1, 3, 4, 5, 7, 8]), array([9, 0, 4, 5]), array([9, 0, 1, 3, 5]), array([6, 0, 1, 2, 8]), array([9, 3, 5, 7]), array([8, 1, 2, 5]), array([6]), array([7, 0, 1, 4, 6])]
>>> find_matching(M)
(0, 9, 5, 4, 1, 2, 3, 8, 6, 7)

This can do thousands of elements in a few seconds:

>>> M = make_example(10000,0.01)
>>> t0,sol,t1 = [time.perf_counter(),find_matching(M),time.perf_counter()]
>>> print(t1-t0)
3.822795882006176

Love your knowledge of math. Thank you. – Ehsan Jun 26 '20 at 00:24 — Ehsan, Jun 26 '20 at 00:24

Balaji Ambresh · Answer 2 · 2020-06-25T08:57:25.570

2

Is this what you're looking for?

def pick_one(a, index, buffer, visited):
    if index == len(a):
        return True
    for item in a[index]:
        if item not in visited:
            buffer.append(item)
            visited.add(item)
            if pick_one(a, index + 1, buffer, visited):
                return True
            buffer.pop()
            visited.remove(item)
    return False


a = [[0, 1], [0], [1, 2]]
buffer = []
pick_one(a, 0, buffer, set())
print(buffer)

Output:

[1, 0, 2]

edited Jun 25 '20 at 08:57

answered Jun 25 '20 at 08:50

Balaji Ambresh

4,977
2
5
17

Yes. Thank you. This is quite a brute-force search. If there is no better solution suggested, I will accept it as answer. – Ehsan Jun 25 '20 at 08:57
Doesn't the version before your edit do the same as this one? what is the reason behind using the `visited` vs. `buffer` to check the already visited `item`s? – Ehsan Jun 25 '20 at 09:38
1

checking a list will take linear time. set will reduce it to constant time. – Balaji Ambresh Jun 25 '20 at 09:40

score 0 · Answer 3 · answered Aug 27 '23 at 08:33

you can use a flow network, this is how you do it:

you have k arbitrary subset of integers from 0 to n-1 and you connect them to the sink.
all the numbers 1 to n-1 are connected to the source.
the numbers are connected to the relevant subset (num i is connected to subset k_i).
all the weights of the graph are Initialized to 1.

after constructing the flow network you find the max flow using a know alogoritem like Edmonds–Karp algorithm or other, this give you a Time complexity of O(m^2*n).

this is true for the following reason:

there is 1 path from the source to the numbers, and one path from the subset to the sink so you can't choose the same number from the same subset twice.
when the algorithm find the max flow in the flow network its gerante that you use all the sub set that are connected to the source.

Find a list of unique representatives (elements) from a list of arrays

3 Answers3

Linked