3

I am looking for an efficient way (preferably a vectorized fast built-in function) to flatten a numpy array in diagonal order. For example:

A=np.array([[1,2,3],
            [4,5,6],
            [7,8,9]])
b=flatten_diagonally(A)

b should be [7,4,8,1,5,9,2,6,3].

A will be a very large matrix so I don't want to iterate over the elements individually. For the same reason I also do not want to prepare in advance a list of all the indices in the correct order. Since A is large and the result will be equally large, I would like to avoid solutions that use a lot of memory in addition.

It would be even better if I could specify which subset of diagonals I would like to flatten, e.g. flattening only 1st and 2nd diagonals will give [1,5,9,2,6].

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
Bitwise
  • 7,577
  • 6
  • 33
  • 50
  • 2
    You'll have to either iterate or use fancy indexing. It's a non-contiguous slice, so there's no way to do it with "normal" slicing. With fancy indexing, you'll need to prepare an array of the indices in advance. Therefore, this is probably best solved by iterating through. `numpy.fromiter` is often useful here if you want to avoid a copy in memory. – Joe Kington May 31 '13 at 21:48
  • 1
    That having been said, have a look at `scipy.sparse.dia_matrix` and `scipy.sparse.spdiags`. One way to accomplish what you want is to convert things temporarily to a `scipy.sparse.dia_matrix`, but this isn't going to be memory-efficient for a dense matrix. – Joe Kington May 31 '13 at 21:55
  • @JoeKington Thanks, unfortunately I thought so. I know this is not contiguous, but I was hoping that there might be a numpy function that is hard-coded in C so it could access these elements faster than just individual access through the Python - but I guess there is no such thing. – Bitwise May 31 '13 at 21:56
  • 2
    @JoeKington yes, I was thinking in this direction and am already looking into this. Thanks again. Perhaps I will also try to write a fast diagonal flattening with iteration in Cython. – Bitwise May 31 '13 at 21:58
  • 1
    If you go that route, it would be interesting to see how the Cython solution compares in speed to a python generator (with simple nested for loops) and `numpy.fromiter`. Cython should be a perfect fit for this, though! – Joe Kington May 31 '13 at 22:05

2 Answers2

0

numpy.diag returns the diagonal along a certain index. Documentation

So this should give you the desired output: (Note that the 0th diagonal is the normal diagonal, so if you want subdiagonals, you may need to use negative values for the diagonals.)

import numpy as np

def flatten_diagonally(npA, diagonals = None):
    diagonals = diagonals or xrange(-npA.shape[0] + 1, npA.shape[1])
    return np.concatenate(map(lambda x: np.diag(npA, k = x), diagonals))

Note that instead of np.diag, you could use np.diagonal, I'm not quite sure which works better. Documentation

James
  • 2,635
  • 5
  • 23
  • 30
  • Thanks, but I mentioned that the array will be very large (e.g. 10000x10000), so this solution will be inefficient, both because of the concatenation and the looping. – Bitwise May 31 '13 at 21:53
  • I don't think you'll be able to find a solution without looping -- at least the loops here are done by numpy's library which is optimized and may even be pushed down to c-code (I'm not sure on that one). – James May 31 '13 at 21:55
  • Actually it will not really be C-optimized, but I might try optimizing by implementing in Cython. – Bitwise May 31 '13 at 21:59
  • The loop over the 19999 diagonals would be optimized because map is written in C. It does return a list though, so if you wanted, you could use a generator comprehension instead. Unfortunately I cannot find the source code for numpy online, so I can't say much about that, but unless you write the actual code in another language then, you're going to have trouble optimizing going faster than numpy. – James May 31 '13 at 22:08
0

The following function is based on indices comparisons, based on the fact that each diagonal has an index relation, for example at the main diagonal i==j, and so on...

It is valid even for non-squared 2D arrays.

def flatten_diagonally(x, diags=None):
    diags = np.array(diags)
    if x.shape[1] > x.shape[0]:
        diags += x.shape[1]-x.shape[0]
    n = max(x.shape)
    ndiags = 2*n-1
    i,j = np.indices(x.shape)
    d = np.array([])
    for ndi in range(ndiags):
        if diags != None:
            if not ndi in diags:
                continue
        d = np.concatenate((d,x[i==j+(n-1)-ndi]))
    return d

Examples:

print flatten_diagonally(A)
#[ 7.  4.  8.  1.  5.  9.  2.  6.  3.]

print flatten_diagonally(A, diags=(1,2))
#[ 4.  8.  1.  5.  9.]

For non-squared arrays:

A=np.array([[1,2,3],
            [7,8,9]])
print flatten_diagonally(A, diags=(1,2))
#[ 1.  8.  2.  9.]
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234