I am trying to build a sliding window approach that will slide over the numerical sequence of the elements in a list. This is important and, I believe, different from other sliding window approaches found in SO, in which the slide is usually made over the indexes of the list.
What I mean is something like the following. Having the list of integers
li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
with a window=3
and step=2
, the expected output would be the:
[1, 3]
[3, 4, 5]
[5, 6, 7]
[7, 8, 9]
[9, 10, 11]
[11, 12]
The code I have so far:
window = 3
step = 2
last_pos = 0
w_start = 1
w_end = window
next_start = w_start + step
dat = [] # values for window
next_dat = [] # values for the next window
li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
for e in li:
ipos = int(e)
if ipos > last_pos:
dat.append(ipos)
if ipos == w_end: # end of window
w_start += step
w_end += step
print(dat)
dat = next_dat # reset window...
if ipos >= next_start: # ipos is in the next window
next_dat.append(ipos)
if w_start == next_start: # move next window
next_start += step
next_dat = [] # reset next window...
else:
raise Exception('List is not sorted')
last_pos += 1
# the last window if not empty
print(dat) if dat else 'false'
The output is the expected:
[1, 3]
[3, 4, 5]
[5, 6, 7]
[7, 8, 9]
[9, 10, 11]
[11, 12]
However, besides not being much elegant, this code seems to fail when more than two windows overlap. For example, with window=5
and step=2
it produces the wrong output:
[1, 3, 4, 5]
[3, 4, 5, 6, 7]
[6, 7, 8, 9]
[8, 9, 10, 11]
[10, 11, 12]
The 1st and 2nd windows are OK, but from the 3rd onwards, things get messy. For example, the third window should have started at 5
and should have 5 elements, not four. I'm aiming to get the following windows instead:
[1, 3, 4, 5]
[3, 4, 5, 6, 7]
[5, 6, 7, 8, 9]
[7, 8, 9, 10, 11]
[9, 10, 11, 12]
Any ideas on how to fix this behaviour?
Please note that it isn't the list indexes but the list values itselves to slide. I believe these two approaches are different in the particular case that some values are missing from the list. In the case shown above the first three items in the list are 1, 3, 4
. I think that iterating over the indexes (window=2
and step=2
) would result in the following output (but this is not tested):
[1, 3]
[4]
whereas what I would like to do is to iterate over the values of the list, so that the resulting windows would be:
[1]
[3, 4]
So the value 2
is missing from the first window because it wasn't in the original list.
Although this is illustrated here with a list in the end I will want to read these from a huge file that will hardly fit into memory.