Efficient way of creating a matrix in numpy by duplicating a given array along a diagonal

Question

I have some array that I already know, say:

a = np.array([1,2,3])

I also know that I want a matrix of some size that has total length of length a + some amount n and width n+1, like so:

n = 4
length = len(a) + n
width = n + 1

I'm looking to create an array that looks like this:

array([[1,2,3,0,0,0,0],
       [0,1,2,3,0,0,0],
       [0,0,1,2,3,0,0],
       [0,0,0,1,2,3,0],
       [0,0,0,0,1,2,3]])

Unfortunately numpy.kron and in general block diagonals are not what I'm looking for, since that would cause the next row to increment by 3 instead of 1.

I have a way of doing it where I can create each row of the matrix by using a for loop and stacking the resulting arrays on top of each other as well as a method where I use scipy.sparse.diag to create the array using, again, a for loop, but I was wondering if there was a more efficient method.

[related](https://stackoverflow.com/a/49982967/7207392) – Paul Panzer Apr 24 '18 at 07:43 — Paul Panzer, Apr 24 '18 at 07:43

Divakar · Accepted Answer · 2018-04-24T08:08:53.820

Here's one with np.lib.stride_tricks.as_strided that gives us views into a zeros-padded array and as such are very efficient, both memory-wise and performance-wise -

def sliding_windows(a, n=4):
    length = len(a) + n
    width = n + 1
    z_pad = np.zeros(n,dtype=a.dtype)
    ac = np.r_[z_pad, a, z_pad]
    s = ac.strides[0]    
    strided = np.lib.stride_tricks.as_strided
    return strided(ac[n:], shape=(width, length), strides=(-s,s),writeable=False)

If you need a writable version, simply make a copy with sliding_windows(a, n=4).copy().

Sample runs -

In [42]: a
Out[42]: array([1, 2, 3])

In [43]: sliding_windows(a, n=4)
Out[43]: 
array([[1, 2, 3, 0, 0, 0, 0],
       [0, 1, 2, 3, 0, 0, 0],
       [0, 0, 1, 2, 3, 0, 0],
       [0, 0, 0, 1, 2, 3, 0],
       [0, 0, 0, 0, 1, 2, 3]])

In [44]: sliding_windows(a, n=5)
Out[44]: 
array([[1, 2, 3, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 0, 0, 0, 0],
       [0, 0, 1, 2, 3, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 0, 0],
       [0, 0, 0, 0, 1, 2, 3, 0],
       [0, 0, 0, 0, 0, 1, 2, 3]])

One more with array-assignment, which should be good if you need a writable version -

def sliding_windows_arrassign(a, n=4):
    pad_length = len(a) + n + 1
    width = n + 1
    p = np.zeros((width,pad_length),dtype=a.dtype)
    p[:,:len(a)] = a
    return p.ravel()[:-n-1].reshape(width,-1)

Benchmarking on larger arrays

1) 100 elements and similar n :

In [101]: a = np.arange(1,101)

In [102]: %timeit sliding_windows(a, n=len(a)+1)
100000 loops, best of 3: 17.6 µs per loop

In [103]: %timeit sliding_windows_arrassign(a, n=len(a)+1)
100000 loops, best of 3: 8.63 µs per loop

# @Julien's soln
In [104]: %%timeit
     ...: n = len(a)+1
     ...: m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
     ...: m.shape = (n+1, n+len(a))
100000 loops, best of 3: 15 µs per loop

2) ~5000 elements and similar n :

In [82]: a = np.arange(1,5000)

In [83]: %timeit sliding_windows(a, n=len(a)+1)
10000 loops, best of 3: 23.2 µs per loop

In [84]: %timeit sliding_windows_arrassign(a, n=len(a)+1)
10 loops, best of 3: 28.9 ms per loop

# @Julien's soln
In [91]: %%timeit
    ...: n = len(a)+1
    ...: m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
    ...: m.shape = (n+1, n+len(a))
10 loops, best of 3: 34.3 ms per loop

np.lib.stride_tricks.as_strided would have a constant runtime irrespective of the array length owing to the memory efficiency discussed earlier.

Thanks, this is good. I have to figure out what you did with np.r_ and the stride calls, but this does what I want as far as I can tell. — Alex Hu, Apr 24 '18 at 07:23
I get `TypeError: as_strided() got an unexpected keyword argument 'writeable'`... — Julien, Apr 24 '18 at 07:30
@Divakar: what's the benefit of creating a non-writable array? — Jonas, Apr 24 '18 at 07:35
@Jonas The huge performance benefit, which comes from the memory efficiency and that's because we are working with `views`, so no extra memory is needed for the output. The timings as shown in this post for large arrays should give some idea. For a writable one, we need `.copy()` and that's a significant overhead. — Divakar, Apr 24 '18 at 07:37

Julien · Answer 2 · 2018-04-24T07:28:32.640

0

Here's another more concise one, not sure how it compares for efficiency though:

a = np.array([1,2,3])

n = 4
m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
m.shape = (n+1, n+len(a))

Efficiency comparison (writable version):

import numpy as np

a = np.arange(100)
n = 100

def Julien(a, n=4):
    m = np.tile(np.hstack((a,np.zeros(n+1))),n+1)[:(n+len(a))*(n+1)]
    m.shape = (n+1, n+len(a))
    return m

def Divakar(a, n=4):
    length = len(a) + n
    width = n + 1
    z_pad = np.zeros(n,dtype=a.dtype)
    ac = np.r_[z_pad, a, z_pad]
    s = ac.strides[0]    
    strided = np.lib.stride_tricks.as_strided
    return strided(ac[n:], shape=(width, length), strides=(-s,s))

%timeit Julien(a)
%timeit Divakar(a)

18.1 µs ± 333 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
23.4 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Apr 24 '18 at 07:28

answered Apr 24 '18 at 07:09

Julien

13,986
5
29
53

Thanks, this works too. I won't know if it's efficient enough until I code it into the raspberry pi and my group tests whether or not it's fast enough for our purposes, but it's good to have options. The only thing I've noticed different so far is that your output defaulted into a float, even though I entered an integer. However, this shouldn't be an issue as I'm expecting to work in float/double anyways. – Alex Hu Apr 24 '18 at 07:27
@AlexHu What's your typical length of `a` and typical value for `n` for your actual use-case? – Divakar Apr 24 '18 at 07:31
I am not sure yet. Specifically, I'm building this to collect information from a analog to discrete signal sent by a speaker. I'm expecting a to be in the 100s, but I have to check with a live test to find out. Similarly, the timestep is something that I have to personally adjust for the best performance. Right now I'm guessing in the magnitude of 10s. However, since the speaker sends out a chirp that is supposed to reflected back from 2 meters (group project is sonar), I'm working with fractions of a second in order to retrieve the right information. That's why I'm concerned with efficiency. – Alex Hu Apr 24 '18 at 07:42

score 0 · Answer 3 · answered Apr 24 '18 at 08:21

Here is an inbuilt (sort of) method. Not the fastest, though:

>>> import scipy.linalg as sl
>>> def f_pp(a, n=4):
...     pad = np.zeros((n,), a.dtype)
...     return sl.toeplitz(*map(np.concatenate, ((a[:1], pad), (a, pad))))
... 
>>> f_pp(np.array([1,2,3]))
array([[1, 2, 3, 0, 0, 0, 0],
       [0, 1, 2, 3, 0, 0, 0],
       [0, 0, 1, 2, 3, 0, 0],
       [0, 0, 0, 1, 2, 3, 0],
       [0, 0, 0, 0, 1, 2, 3]])

Efficient way of creating a matrix in numpy by duplicating a given array along a diagonal

3 Answers3

Benchmarking on larger arrays