Numpy 1D array - find indices of boundaries of subsequences of the same number

Question

I have an numpy.array made by zeros and ones, e.g.:

import numpy
a = numpy.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1])

And now I need to get the first and last index of 1 in each sequence of ones. If I use where, I can get indices of each 1 in the array:

ones = numpy.where(a == 1)
# ones = (array([ 3,  4,  5,  6,  9, 10, 14, 15, 16, 17], dtype=int64),)

But I would like to get only boundaries, it means:

# desired:
ones = (array([ 3,  6,  9, 10, 14, 17], dtype=int64),)

Could you please help me, how to achieve this result? Thank you

Perhaps diff and then find the nonzero indices? Those would indicate a value change. — Nelewout, May 12 '20 at 19:51
I am sorry, it sounds like it should be simple, but would you be so kind and python me a solution? I will check it as the answer, but I don't know what do you precisely mean. I am a numpy newbie. — Honza, May 12 '20 at 19:56

yatu · Accepted Answer · 2020-05-12T20:17:59.660

You can find the beginning and end of these sequences shifting and comparing using bitwise operators and np.where to get the corresponding indices:

def first_and_last_seq(x, n):
    a = np.r_[n-1,x,n-1]
    a = a==n
    start = np.r_[False,~a[:-1] & a[1:]]
    end = np.r_[a[:-1] & ~a[1:], False]
    return np.where(start|end)[0]-1

Checking with the proposed example:

a = np.array([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1])
first_and_last_seq(a, 1)
# array([ 3,  6,  9, 10, 14, 17])

Or with the following array:

a = np.array([5,5,5,6,2,3,5,5,5,2,3,5,5])
first_and_last_seq(a, 5)
# array([ 3,  6,  9, 10, 14, 17])

Further details:

A simple way to check for consecutive values in numpy, is to use bitwise operators to compare shifted versions of an array. Note that ~a[:-1] & a[1:] is doing precesely that. The first term is the array sliced up till the last element, and the second term a slice from the first element onwards.

Note that a is a boolean array, given a = a==n. In the above case we are taking a NOT of the first shifted boolean array (since we want a True is the value is False. And by taking a bitwise AND with the next value, we will only have True is the next sample is True This way we set to True only the indices where the sequences start (i.e. we've matched the subsequence [False, True])

Now the same logic applies for end. And by taking an OR of both arrays and np.where on the result we get all start and end indices.

Wow, it is working like a charm. But, could you, please, a bit explain it? I don't understand, why it is working and what your source code really does. — Honza, May 12 '20 at 20:06

Numpy 1D array - find indices of boundaries of subsequences of the same number

1 Answers1

Linked