0

I am generating sliding windows using np.lib.stride_tricks.as_strided using the following

wsize=4
overlap=0
vector=np.array(range(31))
fillval=np.nan

part_to_fill=np.full(wsize - (vector.shape[0] - 1) % wsize - 1,fillval)
a_ext = np.concatenate(( vector,part_to_fill))
n = a_ext.strides[0]
strided = np.lib.stride_tricks.as_strided   
res=strided(a_ext, shape=(vector.shape[0],wsize), strides=(n,n))[[np.arange(0,len(vector),wsize-overlap)],:]  

if overlap=0 is all fine and I get

array([[[  0.,   1.,   2.,   3.],
        [  4.,   5.,   6.,   7.],
        [  8.,   9.,  10.,  11.],
        ..., 
        [ 20.,  21.,  22.,  23.],
        [ 24.,  25.,  26.,  27.],
        [ 28.,  29.,  30.,  nan]]])

however if overlap=1 is all fine and I get the following, which is unexpected because:

  • results are casted to float
  • contains random numbers instead of the expected nans e.g. -3.25951556e-311

    array([[[  0.00000000e+000,   1.00000000e+000,   2.00000000e+000,
           3.00000000e+000],
        [  3.00000000e+000,   4.00000000e+000,   5.00000000e+000,
           6.00000000e+000],
        [  6.00000000e+000,   7.00000000e+000,   8.00000000e+000,
           9.00000000e+000],
        ..., 
        [  2.40000000e+001,   2.50000000e+001,   2.60000000e+001,
           2.70000000e+001],
        [  2.70000000e+001,   2.80000000e+001,   2.90000000e+001,
           3.00000000e+001],
        [  3.00000000e+001,               nan,   0.00000000e+000,
          -3.25951556e-311]]])
    

even if I cast the results back to int using

res.astype(int)

I get the following which might be even worse

array([[[          0,           1,           2,           3],
        [          3,           4,           5,           6],
        [          6,           7,           8,           9],
        ..., 
        [         24,          25,          26,          27],
        [         27,          28,          29,          30],
        [         30, -2147483648,           0,           0]]])
00__00__00
  • 4,834
  • 9
  • 41
  • 89
  • 2
    `wsize - (vector.shape[0] - 1) % (wsize - overlap) - 1` – MB-F Jan 26 '18 at 10:20
  • 2
    Note that casting to float happens also in the first case (see the dot after the numbers) because integer data types cannot represent NaN. The "random" numbers are uninitialized memory. – MB-F Jan 26 '18 at 10:21
  • wsize - (vector.shape[0] - 1) % (wsize - overlap) - 1 is not working yet, please give it a try with wsize =5 shape =23 overlap =2 – 00__00__00 Jan 26 '18 at 10:59
  • 2
    `a_ext is float because of the float `nan` addition. If the striding adds other random values you haven't got the fill right. – hpaulj Jan 26 '18 at 12:16

1 Answers1

3

np.nan is a float. Concatenating that to an integer array produces a float array.

In [101]: x = np.arange(5)

In [102]: np.concatenate((x, np.full(3, np.nan)))   # x1=...
Out[102]: array([  0.,   1.,   2.,   3.,   4.,  nan,  nan,  nan])

In [106]: n=x1.strides[0]
In [107]: strided(x1, shape=(5,3), strides=(n,n))
Out[107]: 
array([[  0.,   1.,   2.],
       [  1.,   2.,   3.],
       [  2.,   3.,   4.],
       [  3.,   4.,  nan],
       [  4.,  nan,  nan]])

If I didn't pad it with enough nan I would have gotten 'random' values in those extra slots. This part of why as_strided is advanced, and potentially dangerous.

I don't see why you are applying that overlap via indexing after striding. Here's how I'd do the overlap by adjusting the strides:

In [110]: strided(x1, shape=(5,3), strides=(2*n,n))
Out[110]: 
array([[  0.00000000e+000,   1.00000000e+000,   2.00000000e+000],
       [  2.00000000e+000,   3.00000000e+000,   4.00000000e+000],
       [  4.00000000e+000,               nan,               nan],
       [              nan,               nan,               nan],
       [              nan,               nan,   2.59784163e-306]])

Oops, I've asked for too big of an array (or not padded enough):

In [112]: strided(x1, shape=(3,3), strides=(2*n,n))
Out[112]: 
array([[  0.,   1.,   2.],
       [  2.,   3.,   4.],
       [  4.,  nan,  nan]])

Your code adds a single nan fill. Let's change that to 10 (just a convenient larger number). And calculate without the indexing (to get all strided rows):

In [123]: res.shape
Out[123]: (31, 4)

In [124]: res
Out[124]: 
array([[  0.,   1.,   2.,   3.],
       [  1.,   2.,   3.,   4.],
       [  2.,   3.,   4.,   5.],
       [  3.,   4.,   5.,   6.],
       ...
       [ 27.,  28.,  29.,  30.],
       [ 28.,  29.,  30.,  nan],
       [ 29.,  30.,  nan,  nan],
       [ 30.,  nan,  nan,  nan]])

Now you can select every n-th row, without any funny values (except for the float nan).

So as_strided requires a proper strides, proper shape, and proper padding.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • nice answer thanks. very useful so the second part. obiously I have misused strides by obtaining the overlap via indexing. could you please add how to do it correctly in a general form – 00__00__00 Jan 27 '18 at 11:14