3

I want to select a submatrix of a numpy matrix based on whether the diagonal is less than some cutoff value. For example, given the matrix:

Test = array([[1,2,3,4,5],
              [2,3,4,5,6],
              [3,4,5,6,7],
              [4,5,6,7,8],
              [5,6,7,8,9]])

I want to select the rows and columns where the diagonal value is less than, say, 6. In this example, the diagonal values are sorted, so that I could just take Test[:3,:3], but in the general problem I want to solve this isn't the case.

The following snippet works:

def MatrixCut(M,Ecut):
    D = diag(M)
    indices = D<Ecut
    n = sum(indices)
    NewM = zeros((n,n),'d')
    ii = -1
    for i,ibool in enumerate(indices):
        if ibool:
            ii += 1
            jj = -1
            for j,jbool in enumerate(indices):
                if jbool:
                    jj += 1
                    NewM[ii,jj] = M[i,j]
    return NewM

print MatrixCut(Test,6)
[[ 1.  2.  3.]
 [ 2.  3.  4.]
 [ 3.  4.  5.]]

However, this is fugly code, with all kinds of dangerous things like initializing the ii/jj indices to -1, which won't cause an error if somehow I get into the loop and take M[-1,-1].

Plus, there must be a numpythonic way of doing this. For a one-dimensional array, you could do:

D = diag(A)
A[D<Ecut]

But the analogous thing for a 2d array doesn't work:

D = diag(Test)
Test[D<6,D<6]
array([1, 3, 5])

Is there a good way to do this? Thanks in advance.

Rick
  • 1,784
  • 3
  • 15
  • 25

2 Answers2

3

This also works when the diagonals are not sorted:

In [7]: Test = array([[1,2,3,4,5],
              [2,3,4,5,6],
              [3,4,5,6,7],
              [4,5,6,7,8],
              [5,6,7,8,9]])

In [8]: d = np.argwhere(np.diag(Test) < 6).squeeze()

In [9]: Test[d][:,d]
Out[9]: 
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])

Alternately, to use a single subscript call, you could do:

In [10]: d = np.argwhere(np.diag(Test) < 6)

In [11]: Test[d, d.flat]
Out[11]: 
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])

[UPDATE]: Explanation of the second form.

At first, it may be tempting to just try Test[d, d] but that will only extract elements from the diagonal of the array:

In [75]: Test[d, d]
Out[75]: 
array([[1],
       [3],
       [5]])

The problem is that d has shape (3, 1) so if we use d in both subscripts, the output array will have the same shape as d. The d.flat is equivalent to using d.flatten() or d.ravel() (except flat just returns an iterator instead of an array). The effect is that the result has shape (3,):

In [76]: d
Out[76]: 
array([[0],
       [1],
       [2]])

In [77]: d.flatten()
Out[77]: array([0, 1, 2])

In [79]: print d.shape, d.flatten().shape
(3, 1) (3,)

The reason Test[d, d.flat] works is because numpy's general broadcasting rules cause the last dimension of d (which is 1) to be broadcast to the last (and only) dimension of d.flat (which is 3). Similarly, d.flat is broadcast to match the first dimension of d. The result is two (3,3) index arrays, which are equivalent to the following arrays i and j:

In [80]: dd = d.flatten()

In [81]: i = np.hstack((d, d, d)

In [82]: j = np.vstack((dd, dd, dd))

In [83]: print i
[[0 0 0]
 [1 1 1]
 [2 2 2]]

In [84]: print j
[[0 1 2]
 [0 1 2]
 [0 1 2]]

And just to make sure they work:

In [85]: Test[i, j]
Out[85]: 
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])
bogatron
  • 18,639
  • 6
  • 53
  • 47
2

The only way I found to solve your task is somewhat tricky

>>> Test[[[i] for i,x in enumerate(D<6) if x], D<6]
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])

possibly not the best one. Based on this answer. Or (thanks to @bogatron or reminding me argwhere):

>>> Test[np.argwhere(D<6), D<6]
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5]])
Community
  • 1
  • 1
alko
  • 46,136
  • 12
  • 94
  • 102