Finding fixed-length contiguous regions of an nan-filled array (no overlap)

Question

I've found similar questions posted here but none which apply to row-defined time series data. I'm anticipating the solution might be found via numpy or scipi. Because I have so much data, I'd prefer not to use pandas dataframes.

I have many runs of 19-channel EEG data stored in 2d numpy arrays. I've gone through and marked noisy data as nan, so a given run might look something like:

C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  C13  C14  C15  C16  C17  C18  C19
nan 7   5   4   nan nan 7   9   0   -3   nan  2    nan  nan  5    7    6    nan  8
0   6   7   3   5   9   2   2   4   6    8    7    5    6    4    -1   nan  -8   -9
6   8   7   7   0   3   2   4   5   1    3    7    3    8    4    6    9    0    0
...
nan nan nan 3   5   -1  0   nan nan nan  1    2    0    -1   -2   nan  nan  nan  nan

(without channel labels)

Each run is between 80,000 and 120,000 rows (cycles) long.

For each of these runs, I want to create a new stack of contiguous non-overlapping epochs where no values were artifacted to nan. Something like:

def generate_contigs(run, length):
   contigs = np.ndarray(three-dimensional array of arbitrary depth x 19 x length)
   count = 0
   for row in run:
      if nan not in row:
         count+=1
         if count==length:
            stack array of last (length) rows on contigs ndarray
            count = 0
      else:
         count = 0
   return(contigs)

Say, for example, that I specified length 4 (arbitrarily small), and that my function found 9 non-overlapping contigs where no value for 4 straight rows was nan.

My output should look something like:

contigs = [
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array],
[19x4 array]
]

Where each element in the output stack resembles the following:

[4 6 5 8 3 5 4 1 8 8 7 5 6 4 3 5 6 6 5]  
[5 5 7 2 2 9 8 7 7 8 3 0 7 4 4 6 3 7 3]  
[4 4 6 7 9 0 9 9 8 8 7 7 6 6 5 5 4 4 3]  
[1 2 3 4 5 4 3 6 5 4 3 7 6 5 8 7 6 9 8]

Where the 4 rows contained in that element were found continuously in the original run's data array.

I feel like I'm pretty close here, but I'm struggling with the row operations and minimizing iteration. Bonus points if you can find a way to attach the start/stop row indices as a tuple for later analysis.

Could you clarify how should be the expected output? Maybe adding the expected output corresponding to the sample data you've shown? — Valentino, Oct 01 '19 at 22:32
Please, [edit](https://stackoverflow.com/posts/58192692/edit) your question: comments are not for code. — Valentino, Oct 01 '19 at 22:47
Take your time, don't worry. Just keep in mind that comments do not allow well formatted multiline code. Any relevant information or piece of code should be in the question itself. — Valentino, Oct 01 '19 at 22:51
I realized you were asking me to edit my question, not add a comment. Newbie here, thank you for your patience. — Clayton Schneider, Oct 01 '19 at 22:56
Now is clearer. Just one thing: if you have, say, 2 rows without `nan`, then a row with a `nan`. Those two rows should be discharged or should be part of the stack after skipping the row with the `nan` value? — Valentino, Oct 01 '19 at 23:03
Correct, the contig should only be added to the stack if it meets the specified length. After adding the contig, the count would be reset to 0 upon iter to the next row. — Clayton Schneider, Oct 01 '19 at 23:05

Valentino · Accepted Answer · 2019-10-01T23:34:09.983

You could use numpy indexing options to roll over the array and see if any selection with the proper size length x 19 contains any nan value using numpy isnan and numpy any.
If there is no nan value, add the selection to the contigs list and move after, if there is a nan instead move the index by 1 and check if the new selection is free of nan.
On the way is easy to store the indexes of the first row of the stacked selection.

def generate_contigs(run, length):
    i = 0
    contigs = []
    startindexes = []
    while i < run.shape[0]-length:
        stk = run[i:(i+length),:]
        if not np.any(np.isnan(stk)):
            contigs.append(stk)
            startindexes.append(i)
            i += length
        else:
            i += 1
    return contigs, startindexes

This is working perfectly. It seems fast enough, too. – Clayton Schneider Oct 02 '19 at 07:33 — Clayton Schneider, Oct 02 '19 at 07:33

Finding fixed-length contiguous regions of an nan-filled array (no overlap)

1 Answers1