I have a sparse, triangular matrix (e.g. a distance matrix). In reality this would be a > 1M x 1M distance matrix with high sparsity.
from scipy.sparse import csr_matrix
X = csr_matrix([
[1, 2, 3, 3, 1],
[0, 1, 3, 3, 2],
[0, 0, 1, 1, 3],
[0, 0, 0, 1, 3],
[0, 0, 0, 0, 1],
])
I want to subset this matrix to another triangular distance matrix. The indexes may be ordered differently and/or duplicated.
idx = np.matrix([1,2,4,2])
X2 = X[idx.T, idx]
This may result in the resulting matrix not being triangular, with some values missing from the upper triangle, and some values being duplicated in the lower triangle.
>>> X2.toarray()
array([[1, 3, 2, 3],
[0, 1, 3, 1],
[0, 0, 1, 0],
[0, 1, 3, 1]])
How can I get the correct upper triangle matrix as efficiently as possible? Currently, I mirror the matrix before subsetting, and subset it to the triangle afterwards, but this doesn't feel particularly efficient, as it requires, at least, duplication of all entries.
# use transpose method, see https://stackoverflow.com/a/58806735/2340703
X = X + X.T - scipy.sparse.diags(X.diagonal())
X2 = X[idx.T, idx]
X2 = scipy.sparse.triu(X2, k=0, format="csr")
>>> X2.toarray()
array([[1., 3., 2., 3.],
[0., 1., 3., 1.],
[0., 0., 1., 3.],
[0., 0., 0., 1.]])