2

Given a numpy array, which can be subset to indices for array elements meeting given criteria. How do I create tuples of triplets (or quadruplets, quintuplets, ...) from the resulting pairs of indices ?

In the example below, pairs_tuples is equal to [(1, 0), (3, 0), (3, 1), (3, 2)]. triplets_tuples should be [(0, 1, 3)] because all of its elements (i.e. (1, 0), (3, 0), (3, 1)) have pairwise values meeting the condition, whereas (3, 2) does not.

a = np.array([[0.        , 0.        , 0.        , 0.        , 0.      ],
              [0.96078379, 0.        , 0.        , 0.        , 0.      ],
              [0.05498203, 0.0552454 , 0.        , 0.        , 0.      ],
              [0.46005028, 0.45468466, 0.11167813, 0.        , 0.      ],
              [0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0.      ]])

pairs = np.where((a >= .11) & (a <= .99))
pairs_tuples = list(zip(pairs[0].tolist(), pairs[1].tolist()))
# [(1, 0), (3, 0), (3, 1), (3, 2)]

How to get to the below?

triplets_tuples = [(0, 1, 3)]
quadruplets_tuples = []
quintuplets_tuples = []
Eric
  • 95,302
  • 53
  • 242
  • 374
  • Can you define "all of its elements" more precisely? How do you get from `(0, 1, 3)` to `[(1, 0), (3, 0), (3, 1)]`? Why does `(0, 1)` not appear in that set? – Eric Jul 10 '18 at 13:31
  • I think this is [the clique problem](https://en.wikipedia.org/wiki/Clique_problem) – Eric Jul 10 '18 at 13:51
  • Is your input matrix always lower-triangular? – Eric Jul 10 '18 at 13:51
  • Thanks for your comments and questions. The input matrix can be thought of as the lower-triangular of a correlation matrix. The upper would just be symmetric, hence pair (1, 0) is equivalent to/exchangeable with pair (0, 1)). I would like to find triplets (or n-lets, where n>2) of indices that all meet the condition pair-wise, i.e. in the example indices [(0, 1, 3)] are the only valid triplet because all pairs ((1, 0), (3, 0), (3, 1)) within the triplet meet the condition. – user7105520 Jul 10 '18 at 14:13
  • `np.transpose(pairs)` simplifies get pairs. – hpaulj Jul 10 '18 at 14:33

1 Answers1

1

This has an easy part and an NP part. Here's the solution to the easy part.

Let's assume you have the full correlation matrix:

>>> c = a + a.T
>>> c
array([[0.        , 0.96078379, 0.05498203, 0.46005028, 0.1030161 ],
       [0.96078379, 0.        , 0.0552454 , 0.45468466, 0.10350956],
       [0.05498203, 0.0552454 , 0.        , 0.11167813, 0.00109096],
       [0.46005028, 0.45468466, 0.11167813, 0.        , 0.00928037],
       [0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0.        ]])

What you're doing is converting this into an adjacency matrix:

>>> adj = (a >= .11) & (a <= .99)
>>> adj.astype(int)  # for readability below - False and True take a lot of space
array([[0, 1, 0, 1, 0],
       [1, 0, 0, 1, 0],
       [0, 0, 0, 1, 0],
       [1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0]])

This now represents a graph where columns and rows corresponds to nodes, and a 1 is a line between them. We can use networkx to visualize this:

import networkx
g = networkx.from_numpy_matrix(adj)
networkx.draw(g)

enter image description here

You're looking for maximal fully-connected subgraphs, or "cliques", within this graph. This is the Clique problem, and is the NP part. Thankfully, networkx can solve that too:

>>> list(networkx.find_cliques(g))
[[3, 0, 1], [3, 2], [4]]

Here [3, 0, 1] is one of your triplets.

Eric
  • 95,302
  • 53
  • 242
  • 374