2

Suppose I have an array [1,2,3,4,5,6,7,8], and the array is composed of two samples [1,2,3,4], and [5,6,7,8]. For each sample, I want to do a slicing window with window size n. And if there are not enough elements, pad the result with the last elements. Each row in the return value should be the sliced window starting from the element in that row.

For example: if n=3, then the result should be:

[[1,2,3],
 [2,3,4],
 [3,4,4],
 [4,4,4],
 [5,6,7],
 [6,7,8],
 [7,8,8],
 [8,8,8]]

How can I achieve this with efficient slicing instead of a for loop? Thanks.

henrywongkk
  • 1,840
  • 3
  • 17
  • 26
tczj
  • 438
  • 4
  • 17
  • What did you try? Show us your code. Please, check ["How to create a Minimal, Complete, and Verifiable example"](https://stackoverflow.com/help/mcve) and ["How to ask"](https://stackoverflow.com/help/how-to-ask). You will get better results by following the tips on those articles. – accdias Jan 18 '20 at 02:06
  • How do you define the number of samples since the array is uni-dimensional? Also tell us if the number of elements will always be even, and what is the result for an empty array. – accdias Jan 18 '20 at 02:10
  • could the first array be in any order and length? If so modify the example to be more general. Otherwise we'd be tempted to provide a solution that ignores it. – hpaulj Jan 18 '20 at 02:11
  • There are fast moving window methods (based on `as_strided`), I don't think they help here. This has 3 complicating issues - the order determined by one array, the 2 (or more) sample rows, and the padding. I'd suggest developing a good loop based solution, and then worry about whether it can be improved. – hpaulj Jan 18 '20 at 05:05
  • The padding isn't to difficult. Just expand the samples before hand to `[1,2,3,4,4,4,]` and pick a slice `l[j:j+n]`. A pure Python list solution might fast enough. – hpaulj Jan 18 '20 at 05:11

3 Answers3

2

Similar approach of @hpaulj using some numpy built-in functionalities

import numpy as np


samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size

# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]

# rolling window function for arrays
def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])

result
[[1, 2, 3],
 [2, 3, 4],
 [3, 4, 4],
 [4, 4, 4],
 [5, 6, 7],
 [6, 7, 8],
 [7, 8, 8],
 [8, 8, 8]]
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
1

A python list approach:

In [201]: order = [1,3,2,3,5,8]                                                                  
In [202]: samples = [[1,2,3,4],[5,6,7,8]]

expand samples to take care of the padding issue:

In [203]: samples = [row+([row[-1]]*n) for row in samples]                                       
In [204]: samples                                                                                
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]

define a function:

def foo(i, samples):
    for row in samples:
        try:
            j = row.index(i)
        except ValueError:
            continue 
        return row[j:j+n]
In [207]: foo(3,samples)                                                                         
Out[207]: [3, 4, 4]
In [208]: foo(9,samples)  # non-found case isn't handled well

for all the order elements:

In [209]: [foo(i,samples) for i in order]                                                        
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

I have a simple oneliner :

import numpy as np 
samples = np.array([[1,2,3,4],[5,6,7,8]]) 
n,d = samples.shape 
ws = 3

result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]

The output is :

No loop, only broadcasting. This makes it probably the most efficient way of doing it. The dimension of the output is not exactly what you asked for, but it is easy to correct with a simple np.reshape

[[[1 2 3]
  [2 3 4]
  [3 4 4]
  [4 4 4]]
 [[5 6 7]
  [6 7 8]
  [7 8 8]
  [8 8 8]]]
lrnv
  • 1,038
  • 8
  • 19