Contingency matrix to 1D format in Python

Question

2x2 contingency matrix:

Translates to:

[[ 0 0 0 1 ]
 [ 0 0 1 0 ]]

The contingency matrix represents the outcome of two clustering algorithms, each with two clusters. The first row indicates that Ci has three data points in, say, cluster 1 and one data point in, say, cluster 2. Cj has three data points in, say, cluster A and 1 data point in, say, cluster B. Therefore, both algorithms "agree" on two out of N = 4 data points.

Since there does not exist an adjusted mutual information function that takes in the contingency matrix as input, I would like to transform the contingency matrix to 1d inputs for the sklearn implementation of AMI.

Is there an efficient way to re-write a NxN contingency matrix in 1D vector form in Python code?

It would look something like:

V1
V2
For i row index 
  For j column index
     Append as many as contingency_ij elements with value i to V1 and with value j to V2

The output should always be two vectors. Another example:

2 0 0
0 1 0
0 0 1

Would lead to two 1D vectors:

0 0 1 2
0 0 1 2

I have no idea what you're asking. You've posted LaTeX code there -- is that relevant to the question at all? You can't really express a 2D matrix in 1D, but of course Python supports 2D matrices. What do you expect to DO with this data? — Tim Roberts, Jul 20 '22 at 18:47
@Tim I imagine OP tried to format their matrix. It would be better to use a markdown table, or simple text in between triple backticks. — mozway, Jul 20 '22 at 18:51
If you can explain how `[[2,1],[1,0]]` becomes `[[0,0,0,1],[0,0,1,0]]`, then I'm sure we can come up with code to do it. Neither of those is 1D, of course. — Tim Roberts, Jul 20 '22 at 20:32
@TimRoberts Indeed, LaTeX was for formatting purposes. The contingency matrix represents two clustering outcomes, each having two clusters. But I'll edit the question. — Sean_TBI_Research, Jul 20 '22 at 21:53
Please provide a reference implementation which includes inputs and outputs. — Mad Physicist, Jul 20 '22 at 22:11

Tim Roberts · Accepted Answer · 2022-07-21T17:44:54.063

1

Well, this solves the problem as you have stated it. The final matrix v can be converted to numpy. v would need as many empty elements as there are dimensions in c.


def produce_vectors( c ):
    v = [[],[]]

    for i,row in enumerate(c):
        for j,val in enumerate(row):
            v[0].extend( [i]*val )
            v[1].extend( [j]*val )
    return v

c = [[2,1],[1,0]]
print(produce_vectors(c))
c = [[2,0,0],[0,1,0],[0,0,1]]
print(produce_vectors(c))

Output:

[[0, 0, 0, 1], [0, 0, 1, 0]]
[[0, 0, 1, 2], [0, 0, 1, 2]]

edited Jul 21 '22 at 17:44

answered Jul 21 '22 at 00:45

Tim Roberts

48,973
4
21
30

The final output should ALWAYS be two vectors, even if c is larger than 2x2 or non-squared, as input for sklearn.metrics.adjusted_mutual_info_score(labels_true, labels_pred, *, average_method='arithmetic') – Sean_TBI_Research Jul 21 '22 at 07:23
Is there a way to make the size of v flexible depending on c? – Sean_TBI_Research Jul 21 '22 at 12:24
It already does that. Have you looked at the code? The two output vectors grow as required. Each will end up as long as the sum of all the values in `c`. The NUMBER of vectors in `v` depends only on the number of DIMENSIONS in `c`. Since it is 2D, there will be 2 vectors. – Tim Roberts Jul 21 '22 at 17:42

mozway · Answer 2 · 2022-07-21T02:45:31.687

0

A numpy implementation could take advantage of numpy.repeat:

# input contingency matrix
a = np.array([[2,1],[1,0]])
# fixed "cluster id" matrix
b = np.array([[0,1],[0,1]])
out = np.vstack([np.repeat(b.ravel('F'), a.ravel()),
                 np.repeat(b.ravel(), a.ravel())
                 ])

Output:

array([[0, 0, 0, 1],
       [0, 0, 1, 0]])

Other example with [[5,4],[0,3]] as input:

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]])

You can also use cluster ids other that 0/1, if wanted (example with a = np.array([[5,4],[0,3]]) ; b = np.array([[0,1],[2,3]])):

array([[0, 0, 0, 0, 0, 2, 2, 2, 2, 3, 3, 3],
       [0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 3]])

edited Jul 21 '22 at 02:45

answered Jul 21 '22 at 02:40

mozway

194,879
13
39
75

is there a way to make this work for any nxn contingency matrix, now I am getting the error: "operands could not be broadcast together with shape" – Sean_TBI_Research Jul 21 '22 at 11:09
can you provide a larger example? – mozway Jul 21 '22 at 11:31
Another example is in the question – Sean_TBI_Research Jul 21 '22 at 12:18

Contingency matrix to 1D format in Python

2 Answers2