Simultaneously diagonalize matrices with numpy

Question

I have a m × n × n numpy.ndarray of m simultaneously diagonalizable square matrices and would like to use numpy to obtain their simultaneous eigenvalues.

For example, if I had

from numpy import einsum, diag, array, linalg, random
U = linalg.svd(random.random((3,3)))[2]

M = einsum(
    "ij, ajk, lk",
    U, [diag([2,2,0]), diag([1,-1,1])], U)

the two matrices in M are simultaneously diagonalizable, and I am looking for a way to obtain the array

array([[2.,  1.],
       [2., -1.],
       [0.,  1.]])

(up to permutation of the lines) from M. Is there a built-in or easy way to get this?

I think the way to go is the one following from the answer in http://math.stackexchange.com/questions/12220/symmetrize-eigenvectors-of-degenerate-repeated-eigenvalue, though I still appreciate a good implementation of those steps. — Anaphory, Feb 21 '13 at 17:27

score 8 · Answer 1 · answered Mar 12 '15 at 18:27

There is a fairly simple and very elegant simultaneous diagonalization algorithm based on Givens rotation that was published by Cardoso and Soulomiac in 1996:

Cardoso, J., & Souloumiac, A. (1996). Jacobi Angles for Simultaneous Diagonalization. SIAM Journal on Matrix Analysis and Applications, 17(1), 161–164. doi:10.1137/S0895479893259546

I've attached a numpy implementation of the algorithm at the end of this response. Caveat: It turns out simultaneous diagonalization is a bit of a tricky numerical problem, with no algorithm (to the best of my knowledge) that guarantees global convergence. However, the cases in which it does not work (see the paper) are degenerate and in practice I have never had the Jacobi angles algorithm fail on me.

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""
Routines for simultaneous diagonalization
Arun Chaganty <arunchaganty@gmail.com>
"""

import numpy as np
from numpy import zeros, eye, diag
from numpy.linalg import norm

def givens_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 

    return A

def givens_double_rotate( A, i, j, c, s ):
    """
    Rotate A along axis (i,j) by c and s
    """
    Ai, Aj = A[i,:], A[j,:]
    A[i,:], A[j,:] = c * Ai + s * Aj, c * Aj - s * Ai 
    A_i, A_j = A[:,i], A[:,j]
    A[:,i], A[:,j] = c * A_i + s * A_j, c * A_j - s * A_i 

    return A

def jacobi_angles( *Ms, **kwargs ):
    r"""
    Simultaneously diagonalize using Jacobi angles
    @article{SC-siam,
       HTML =   "ftp://sig.enst.fr/pub/jfc/Papers/siam_note.ps.gz",
       author = "Jean-Fran\c{c}ois Cardoso and Antoine Souloumiac",
       journal = "{SIAM} J. Mat. Anal. Appl.",
       title = "Jacobi angles for simultaneous diagonalization",
       pages = "161--164",
       volume = "17",
       number = "1",
       month = jan,
       year = {1995}}

    (a) Compute Givens rotations for every pair of indices (i,j) i < j 
            - from eigenvectors of G = gg'; g = A_ij - A_ji, A_ij + A_ji
            - Compute c, s as \sqrt{x+r/2r}, y/\sqrt{2r(x+r)}
    (b) Update matrices by multiplying by the givens rotation R(i,j,c,s)
    (c) Repeat (a) until stopping criterion: sin theta < threshold for all ij pairs
    """

    assert len(Ms) > 0
    m, n = Ms[0].shape
    assert m == n

    sweeps = kwargs.get('sweeps', 500)
    threshold = kwargs.get('eps', 1e-8)
    rank = kwargs.get('rank', m)

    R = eye(m)

    for _ in xrange(sweeps):
        done = True
        for i in xrange(rank):
            for j in xrange(i+1, m):
                G = zeros((2,2))
                for M in Ms:
                    g = np.array([ M[i,i] - M[j,j], M[i,j] + M[j,i] ])
                    G += np.outer(g,g) / len(Ms)
                # Compute the eigenvector directly
                t_on, t_off = G[0,0] - G[1,1], G[0,1] + G[1,0]
                theta = 0.5 * np.arctan2( t_off, t_on + np.sqrt( t_on*t_on + t_off * t_off) )
                c, s = np.cos(theta), np.sin(theta)

                if abs(s) > threshold:
                    done = False
                    # Update the matrices and V
                    for M in Ms:
                        givens_double_rotate(M, i, j, c, s)
                        #assert M[i,i] > M[j, j]
                    R = givens_rotate(R, i, j, c, s)

        if done:
            break
    R = R.T

    L = np.zeros((m, len(Ms)))
    err = 0
    for i, M in enumerate(Ms):
        # The off-diagonal elements of M should be 0
        L[:,i] = diag(M)
        err += norm(M - diag(diag(M)))

    return R, L, err

I tried this program but the result is not right, I think it should be -t_on + np.sqrt(t_on * t_on + t_off * t_off) in the line where theta is calculated, G is a symmetric matrix [[a, b],[b, c]], the largest eigenvalue lambda is (a+c+sqrt((a-c) ** 2+4 * b ** 2)/2, let the eigenvector corresponds to this eigenvalue be [[1],[t]] (without normalization), then a + b*t = lambda, t = (lambda - a)/b = (-a+c+sqrt((a-c) ** 2+4 * b ** 2)/2/b = (-t_on + sqrt(t_on ** 2 + t_off **2)/t_off. You can print out the square sum of the off diagonal elements in every outer loop to see if it decreases monotonically — Ziqian Xie, Sep 09 '16 at 18:57
Also I found that using arctan2 may have numerical issue and doesn't converge. I am using x, y = 1/(1+t ** 2), t/(1+t ** 2), c = sqrt(0.5 + x/2), s = y/2/c instead. — Ziqian Xie, Sep 09 '16 at 19:02
-1 **This code is not correct.** For example, it fails to simultaneously diagonalize the identity matrix {{1, 0}, {0, 1}} and the X matrix {{0, 1}, {1, 0}}. It should return an R such as {{1, 1}, {1, -1}}/sqrt(2), but instead it returns an identity matrix for R. — Craig Gidney, Oct 26 '17 at 22:41

Bálint Aradi · Answer 2 · 2013-02-22T06:22:33.857

4

I am not aware of any direct solution. But why not just getting the eigenvalues and the eigenvectors of the first matrix, and using the eigenvectors to transform all other matrices to the diagonal form? Something like:

eigvals, eigvecs = np.linalg.eig(matrix1)
eigvals2 = np.diagonal(np.dot(np.dot(transpose(eigvecs), matrix2), eigvecs))

You can the add the columns to an array via hstack if you like.

UPDATE: As pointed out below, this is only valid if no degenerate eigenvalues occur. Otherwise one would have to check first for the degenerate eigenvalues, then transform the 2nd matrix to a blockdiagonal form, and diagonalize eventual blocks bigger than 1x1 separately.

edited Feb 22 '13 at 06:22

answered Feb 21 '13 at 17:31

Bálint Aradi

3,754
16
22

1

Because these eigenvectors are not necessarily eigenvectors of the second matrix, so the diagonal might not be the eigenvalues (!) of the second matrix. If as in the example in my question 2 is a double eigenvalue of the first matrix, I get two “random” eigenvectors for that eigenvalue. These are not necessarily eigenvectors for 1 and -1 respectively of the second matrix, the cases where they are are a zero set. – Anaphory Feb 22 '13 at 00:43
I agree, in case of degenerated eigenvalues things are somewhat more complicated as you would end up with a blockdiagonal matrix. But then, you can diagonalize those blocks bigger than 1x1 separately. It would be still faster as to diagonalize the whole 2nd matrix. – Bálint Aradi Feb 22 '13 at 06:15
Splitting the partly diagonalized matrices into the blocks and then diagonalizing the blocks is easier described than written in python, in particular considering numerical inaccuracies, and that's why I asked for a good way to do this. – Anaphory Feb 22 '13 at 11:36

score 1 · Accepted Answer · answered Feb 22 '13 at 16:02

I am sure there is significant room for improvement in my solution, but I have come up with the following set of three functions doing the calculation for me in a semi-robust way.

def clusters(array,
             orig_indices = None,
             start = 0,
             rtol=numpy.allclose.__defaults__[0],
             atol=numpy.allclose.__defaults__[1]):
    """For an array, return a permutation that sorts the numbers and the sizes of the resulting blocks of identical numbers."""
    array = numpy.asarray(array)
    if not len(array):
        return numpy.array([]),[]
    if orig_indices is None:
        orig_indices = numpy.arange(len(array))
    x = array[0]
    close = abs(array-x) <= (atol + rtol*abs(x))
    first = sum(close)
    r_perm, r_sizes = clusters(
        array[~close],
        orig_indices[~close],
        start+first,
        rtol, atol)
    r_sizes.insert(0, first)
    return numpy.concatenate((orig_indices[close], r_perm)), r_sizes

def permutation_matrix(permutation, dtype=dtype):
    n = len(permutation)
    P = numpy.zeros((n,n), dtype)
    for i,j in enumerate(permutation):
        P[j,i]=1
    return P

def simultaneously_diagonalize(tensor, atol=numpy.allclose.__defaults__[1]):
    tensor = numpy.asarray(tensor)
    old_shape = tensor.shape
    size = old_shape[-1]
    tensor = tensor.reshape((-1, size, size))
    diag_mask = 1-numpy.eye(size)

    eigvalues, diagonalizer = numpy.linalg.eig(tensor[0])
    diagonalization = numpy.dot(
        numpy.dot(
            matrix.linalg.inv(diagonalizer),
            tensor).swapaxes(0,-2),
        diagonalizer)
    if numpy.allclose(diag_mask*diagonalization, 0):
        return diagonalization.diagonal(axis1=-2, axis2=-1).reshape(old_shape[:-1])
    else:
        perm, cluster_sizes = clusters(diagonalization[0].diagonal())
        perm_matrix = permutation_matrix(perm)
        diagonalization = numpy.dot(
            numpy.dot(
                perm_matrix.T,
                diagonalization).swapaxes(0,-2),
            perm_matrix)
        mask = 1-scipy.linalg.block_diag(
            *list(
                numpy.ones((blocksize, blocksize))
                for blocksize in cluster_sizes))
        print(diagonalization)
        assert(numpy.allclose(
                diagonalization*mask,
                0)) # Assert that the matrices are co-diagonalizable
        blocks = numpy.cumsum(cluster_sizes)
        start = 0
        other_part = []
        for block in blocks:
            other_part.append(
                simultaneously_diagonalize(
                    diagonalization[1:, start:block, start:block]))
            start = block
        return numpy.vstack(
            (diagonalization[0].diagonal(axis1=-2, axis2=-1),
             numpy.hstack(other_part)))

There is still a problem with this approach which I cannot pinpoint yet. Use with caution! — Anaphory, Feb 22 '13 at 16:55

score 1 · Answer 4 · answered Aug 22 '13 at 15:35

If you know something about the size of the eigenvalues of the two matrices in advance, you can diagonalize a linear combination of the two matrices, with coefficients chosen to break the degeneracy. For example, if the eigenvalues of both lie between -10 and 10, you could diagonalize 100*M1 + M2. There's a slight loss of precision, but for many purposes it's good enough--and quick and easy!

Simultaneously diagonalize matrices with numpy

4 Answers4