What is the difference between hard and soft masked numpy arrays?

Question

I have been browsing the numpy docs for masked arrays (which I really like and use regularly) and found numpy.ma.harden_mask and numpy.ma.soften_mask which affect the MaskedArray.hardmask attribute, but I can't find an explanation of that attribute's purpose.

Hence: What is the difference between hard and soft masked numpy arrays?

You might be more familiar withe masked_arrays than most of us, if you use them regularly. I for one only use them to answer (beginner's questions). Most of the `ma` code is written in Python - I don't think it has any compiled additions. So comments in the function code might add information that's not in the docs. — hpaulj, Jan 06 '22 at 18:02
I'd be interested to know too - I use masked arrays regularly as well, and had never heard of this ...a quick duckduckgo search turned up nothing explaining this at all. — Richard, Jan 06 '22 at 18:07

hans_meine · Answer 1 · 2022-01-06T18:29:58.863

As @hpaulj wrote that the ma module is implemented in Python, I looked at the sources and indeed found answers:

The numpy.ma implementation has a "hardmask" feature, which prevents values from ever being unmasked by assigning a value. This would be an internal array flag, named something like 'arr.flags.hardmask'.

If the hardmask feature is implemented, boolean indexing could return a hardmasked array instead of a flattened array with the arbitrary choice of C-ordering as it currently does. While this improves the abstraction of the array significantly, it is not a compatible change.

The following documentation of the hardmask attribute could be found in my old checkout in maskedarray.baseclass.rst, but it vanished after updating, which explains why it is missing on the website:

Returns whether the mask is hard (True) or soft (False). When the mask is hard, masked entries cannot be unmasked.

(I will send a PR suggesting that this sentence is restored and extended with ".. by element assigment".)

Here's a demonstration session:

>>> import numpy
>>> x = numpy.arange(10)
>>> m = numpy.ma.masked_array(x, x>5)
>>> assert not m.hardmask
>>> m[8] = 42
>>> m
masked_array(data=[0, 1, 2, 3, 4, 5, --, --, 42, --],
             mask=[False, False, False, False, False, False,  True,  True,
                   False,  True],
       fill_value=999999)
>>> hardened = numpy.ma.harden_mask(m)
>>> assert hardened.hardmask
>>> assert m.hardmask, 'harden_mask() affects AND returns the argument'
>>> m[9] = 23
>>> m
masked_array(data=[0, 1, 2, 3, 4, 5, --, --, 42, --],
             mask=[False, False, False, False, False, False,  True,  True,
                   False,  True],
       fill_value=999999)
>>> m[:] = 23
>>> m
masked_array(data=[23, 23, 23, 23, 23, 23, --, --, 23, --],
             mask=[False, False, False, False, False, False,  True,  True,
                   False,  True],
       fill_value=999999)

Followed up on in https://github.com/numpy/numpy/issues/19331 — hans_meine, Jan 07 '22 at 13:51

What is the difference between hard and soft masked numpy arrays?

1 Answers1