Loop over clump_masked indices

Question

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].

I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.

Here's the solution I came up with.

# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)

y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]

for sl in masked_slices:
    for i, y_i in y_enum[sl]:
        # simplified example calculation
        y_filtered[i] = np.average(y_filtered[i-2:i+2])

It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?

Thanks!

Are you trying to smooth `y_filtered` using the non-masked values around each index, or using both masked and non-masked surrounding values? — jdehesa, May 18 '18 at 14:40
I ended up using `ma.compressed` to remove all the masked data, and then interpolated the missing values using scipy. I'm still interested in how to do this loop though. — MTV DNA, May 18 '18 at 14:56
I edited my answer with an efficient solution without the loop. — jdehesa, May 18 '18 at 15:26

jdehesa · Accepted Answer · 2018-05-18T15:25:32.677

EDIT:

I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:

import numpy as np
from numpy.lib.stride_tricks import as_strided

SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
                mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]

pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
                  constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
    data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
    mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy() 
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]

(note all the values are integers here because x was originally of integer type)

The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.

You could do this:

from itertools import chain

# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
    y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

You're right, my code indeed has some errors. I left out some bits and pieces that were irrelevant, and did not test what I posted here. Apparently I made an oversight. I ended up going with numpy's linear interpolation. Linear interpolation is good enough for my purpose, and I need every data point to have a value. I think numpy's approach of returning a list of slices instead of a list of indices is in some cases a bit unwieldy. On the other hand, if your slice is very big, a list of indices isn't what you want either. Anyways, I learned something today. Thanks! — MTV DNA, May 18 '18 at 21:11

Loop over clump_masked indices

1 Answers1