efficient ways to find start and end indexes for sequences?

Question

So lets say I have a list in python:

[0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1]

and I want to find the indices where 1's start and end, eg:

[[4, 5], [8, 13], [16, 16], [18, 21]]

What is an efficient way to do this in python? All I can think of are expensive for loops where I look forward/backward if the iteration shows a 1 and before it was a 0? Perhaps a cumsum? But that would still require a loop.

Also, would this problem change if I were to add rules? like there must be at least 3 consecutive 1's before its recorded, eg:

[[8, 13], [18, 21]]

or there must be 3 consecutive instances of any digit before a change in pattern is recorded or stopped, eg:

[[8, 21]]

My problem is with lists of characters, but the core of it can be boiled down to this mindset.

My inefficient solution is to scan through with some padding:

answer = []
record_flag = 0
vec = [0] + vec + [0]
for i in range(len(vec)-1):
    if vec[i] == 0 and vec[i+1] == 1:
        start = i
        record_flag += 1
    elif vec[i] == 1 and vec[i+1] == 0:
        end = i-1
        record_flag += 1

    if record_flag >= 2:
        record_flag = 0
        answer.append([start, end])

and if I need to add rules, I'd just manually insert them into the if-statements. But this does not feel very pythonic. Any advice?

I believe [this](https://stackoverflow.com/questions/61760669/numpy-1d-array-find-indices-of-boundaries-of-subsequences-of-the-same-number) will help you regarding the first question. It depends on bitwise operations. you can also check [this](https://stackoverflow.com/questions/50151417/numpy-find-indices-of-groups-with-same-value) — Hozayfa El Rifai, May 13 '20 at 01:33

score 1 · Answer 1 · answered May 13 '20 at 01:58

Here's a method using itertools.groupby to solve your first two questions. It first finds the segments and their length and then computes start positions of each segment. Finally segments are filtered out (in the two print statements) first by their value, and second by value and length.

from itertools import groupby

vec = [0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1]

# find sequences and lengths
# seqs = [(key, length), ...]
seqs = [(key, len(list(val))) for key, val in groupby(vec)]
# find start positions of sequences
# seqs = [(key, start, length), ...]
seqs = [(key, sum(s[1] for s in seqs[:i]), len) for i, (key, len) in enumerate(seqs)]

print([[s[1], s[1] + s[2] - 1] for s in seqs if s[0] == 1])
print([[s[1], s[1] + s[2] - 1] for s in seqs if s[0] == 1 and s[2] > 2])

Output:

[[4, 5], [8, 13], [16, 16], [18, 21]]
[[8, 13], [18, 21]]

efficient ways to find start and end indexes for sequences?

1 Answers1