-1

I am trying to build a sliding window approach that will slide over the numerical sequence of the elements in a list. This is important and, I believe, different from other sliding window approaches found in SO, in which the slide is usually made over the indexes of the list.

What I mean is something like the following. Having the list of integers

li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

with a window=3 and step=2, the expected output would be the:

[1, 3]
[3, 4, 5]
[5, 6, 7]
[7, 8, 9]
[9, 10, 11]
[11, 12]

The code I have so far:

window = 3
step = 2

last_pos = 0
w_start = 1
w_end = window
next_start = w_start + step
dat = []  # values for window
next_dat = []  # values for the next window

li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

for e in li:
    ipos = int(e)
    if ipos > last_pos:
        dat.append(ipos)

        if ipos == w_end:  # end of window
            w_start += step
            w_end += step
            print(dat)
            dat = next_dat  # reset window...

        if ipos >= next_start:  # ipos is in the next window
            next_dat.append(ipos)

        if w_start == next_start:  # move next window
            next_start += step
            next_dat = []  # reset next window...
    else:
        raise Exception('List is not sorted')

    last_pos += 1

# the last window if not empty
print(dat) if dat else 'false'

The output is the expected:

[1, 3]
[3, 4, 5]
[5, 6, 7]
[7, 8, 9]
[9, 10, 11]
[11, 12]

However, besides not being much elegant, this code seems to fail when more than two windows overlap. For example, with window=5 and step=2 it produces the wrong output:

[1, 3, 4, 5]
[3, 4, 5, 6, 7]
[6, 7, 8, 9]
[8, 9, 10, 11]
[10, 11, 12]

The 1st and 2nd windows are OK, but from the 3rd onwards, things get messy. For example, the third window should have started at 5 and should have 5 elements, not four. I'm aiming to get the following windows instead:

[1, 3, 4, 5]
[3, 4, 5, 6, 7]
[5, 6, 7, 8, 9]
[7, 8, 9, 10, 11]
[9, 10, 11, 12]

Any ideas on how to fix this behaviour?

Please note that it isn't the list indexes but the list values itselves to slide. I believe these two approaches are different in the particular case that some values are missing from the list. In the case shown above the first three items in the list are 1, 3, 4. I think that iterating over the indexes (window=2 and step=2) would result in the following output (but this is not tested):

[1, 3]
[4]

whereas what I would like to do is to iterate over the values of the list, so that the resulting windows would be:

[1]
[3, 4]

So the value 2 is missing from the first window because it wasn't in the original list.

Although this is illustrated here with a list in the end I will want to read these from a huge file that will hardly fit into memory.

PedroA
  • 1,803
  • 4
  • 27
  • 50
  • change `if ipos > last_pos:` to `if ipos >= last_pos:` – eyllanesc Jun 02 '17 at 23:54
  • Sorry, I got confused, change `ipos == w_end` to `ipos > w_end` – eyllanesc Jun 02 '17 at 23:59
  • Ah, almost there. That fixes the start of the windows but then messes with everything else. – PedroA Jun 03 '17 at 00:02
  • You could explain your requirements better by pointing out possible inputs and possible outputs – eyllanesc Jun 03 '17 at 00:04
  • @eyllanesc Sorry, but I believe that is explained in the post. I listed the input, which for exemplification purposes is the list li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] and the desired output is there already. Is there anything that might be useful to illustrate the problem? – PedroA Jun 03 '17 at 00:08
  • It is not clear how sliding over the indices is different from what you want, considering that the numerical values are ordered. Also, it isn't obvious what is wrong in the output you are getting. – Antimony Jun 03 '17 at 00:41
  • OK. I edited the question, trying to highlight this difference. I believe these two approaches are different in the particular case that some values are missing from the list. – PedroA Jun 03 '17 at 01:09

1 Answers1

0

The problem with the code in the question is that it is uncertain how many windows you will need to keep track of beforehand. The best approach for this task is likely to use just one list for the window and then copy those values that overlap with the next window and so forth.

The code below is working for all the windows I tested:

window = 3
step = 2

last_pos = 0
w_start = 1
w_end = window
dat = []  # values for window
next_dat = []  # values for the next window

li = [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

for e in li:
    ipos = int(e)
    if ipos > last_pos:

        if ipos > w_end:  # end of window
            print(dat)
            w_start += step
            w_end += step

            if step == window:  # non-overlapping
                next_dat = []  # reset next window...
            else:
                next_dat = [x for x in dat if x >= (w_start)]

            dat = next_dat  # reset window...

        dat.append(ipos)
    else:
        raise Exception('List is not sorted')

    last_pos += 1

# the last window if not empty
print(dat) if dat else 'false'

(window=3 and step=2)

[1, 3]
[3, 4, 5]
[5, 6, 7]
[7, 8, 9]
[9, 10, 11]
[11, 12]

(window=2 and step=2)

[1]
[3, 4]
[5, 6]
[7, 8]
[9, 10]
[11, 12]

(window=5 and step=2)

[1, 3, 4, 5]
[3, 4, 5, 6, 7]
[5, 6, 7, 8, 9]
[7, 8, 9, 10, 11]
[9, 10, 11, 12]

Again, this code is not very elegant I think, but it does the job so I'll mark this answer as accepted. However, I'm still open to any improvements/advises for this code.

PedroA
  • 1,803
  • 4
  • 27
  • 50