2

I'm trying to translate the as_strided function of NumPy to a function in Python when I translate ahead the number of strides to the number of variables according to the type of the variable (for float32 I divide the stride by 4, etc).

The code I implemented:

def as_strided(x, shape, strides):
    x = x.flatten()
    size = 1
    for value in shape:
        size *= value
    arr = np.zeros(size, dtype=np.float32)
    curr = 0
    for i in range(shape[0]):
        for j in range(shape[1]):
            for k in range(shape[2]):
                arr[curr] = x[i * strides[0] + j * strides[1] + k * strides[2]]
                curr = curr + 1
    return np.reshape(arr, shape)

In order to test the function I wrote 2 auxiliary functions:

def sliding_window(x, shape, strides):
    f_mine = as_strided(x, shape, [stride // 4 for stride in strides])
    f_np = np.lib.stride_tricks.as_strided(x, shape=shape, strides=strides).copy()
    check_strides(x.flatten(), f_mine)
    check_strides(x.flatten(), f_np)
    return f_mine, f_np

def check_strides(original, strided):
    s1 = int(np.where(original == strided[1][0][0])[0])
    s2 = int(np.where(original == strided[0][1][0])[0])
    s3 = int(np.where(original == strided[0][0][1])[0])
    print([s1, s2, s3])
    return [s1, s2, s3]

In the main code, I selected some shape and strides values and ran 2 cases:

  1. Uploaded a .npy file that includes a matrix in float32 - variable x.
  2. Created random matrix of the same size and type as variable x - variable y.

When I check the strides of the resulting matrices I get a strange phenomenon. For case 1 - the final resulted strides obtained using the NumPy function are different from the required stride (and from my implementation). For case 2 - the outputs are identical.

The main code:

shape = (30, 818, 300)
strides = (4, 120, 120)

# case 1
x = np.load('x.npy')
s_mine, s_np = sliding_window(x, shape, strides)
print(np.array_equal(s_mine, s_np))

#case 2
y = np.random.randn(x.shape[0], x.shape[1]).astype(np.float32)
s_mine, s_np = sliding_window(y, shape, strides)
print(np.array_equal(s_mine, s_np))

Here you can find the x.npy file that causes the desired stride change in the numpy function. I'd be happy if anyone could explain to me why this is happening.

  • Here is a link to the documentation of the function: https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html . It will be helpful to write what this function is meant to do. – Triceratops Feb 23 '22 at 18:44
  • 1
    When I first read "naive function in Python" I though you were trying to replicate it with lists, which have a totally different storage and indexing model. But no, you are trying to replicate it with arrays - but without taking full advantage of the numpy capabilities. Do you understand what the `as_strided` is doing? How it is creating a `view` with different shape and strides? – hpaulj Feb 23 '22 at 19:05

1 Answers1

1

I downloaded x.npy and loaded it. And ran as_strided on y. I haven't looked at your code.

Normally when playing with as_strided I like to look at the arrays, but in this case they are large enough that I'll focus more making sense the strides and shape.

In [39]: x.shape, x.strides
Out[39]: ((30, 1117), (4, 120))
In [40]: y.shape, y.strides
Out[40]: ((30, 1117), (4468, 4))

I wondered where you got the

shape = (30, 818, 300)
strides = (4, 120, 120)

OK the 30 is shared, but the 4 is only for x. And with those strides x looks like it's F ordered, may be even a transpose of a (1117,30) array. Your y, which was constructed with random, has the typical strides for C ordered array, 4 bytes for the inner, trailing dimension, and 4*1117 for the leading dimension.

hpaulj
  • 221,503
  • 14
  • 230
  • 353