5

I have one simple 3D array a1, and its masked analog a2:

import numpy

a1 = numpy.array([[[ 0.00,  0.00,  0.00],
                   [ 0.88,  0.80,  0.78],
                   [ 0.75,  0.78,  0.77]],

                  [[ 0.00,  0.00,  0.00],
                   [ 3.29,  3.29,  3.30],
                   [ 3.27,  3.27,  3.26]],

                  [[ 0.00,  0.00,  0.00],
                   [ 0.41,  0.42,  0.40],
                   [ 0.42,  0.43,  0.41]]])


a2 = numpy.ma.masked_equal(a1, 0.)

I want to perform the mean of this array along several axes at a time (this is a peculiar, undocumented use of axis argument in numpy.mean, see e.g. here for an example):

numpy.mean(a1, axis=(0, 1))

This is working fine with a1, but I get the following error with the masked array a2:

TypeError: tuple indices must be integers, not tuple

And I get the same error with the masked version numpy.ma.mean(a2, axis=(0, 1)), or if I unmask the array through a2[a2.mask]=0.

I am using a tuple for the axis argument in numpy.mean as it is actually not hardcoded (this command is applied on arrays with potenially different number of dimensions, according to which the tuple is adapted).

Problem encountered with numpy version 1.9.1 and 1.9.2.

Community
  • 1
  • 1
ztl
  • 2,512
  • 1
  • 26
  • 40
  • Could you provide a cut-and-paste-able example? – Lee May 13 '15 at 08:34
  • According to the [docs](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html), the axis argument is expected to be an int. What does passing a tuple instead of an int do? – ypx May 13 '15 at 08:35
  • Shouldn't you be using the [`ma` version of `mean`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ma.mean.html) for a masked array argument? – user2357112 May 13 '15 at 08:40
  • @atomh33ls done, sorry – ztl May 13 '15 at 09:18
  • @ypx this can be found elsewhere on SO, see my link in the edited question – ztl May 13 '15 at 09:18
  • @user2357112 No, it doesn't seem to help... – ztl May 13 '15 at 09:18
  • 2
    Huh. I was under the impression that the behavior of `numpy.mean` with a tuple for `axis` was supposed to be documented by now, but [it only shows up in the development branch documentation](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.mean.html#numpy.mean). I could have sworn it was in the dev branch documentation back when the current release was the dev branch. It looks like `numpy.ma` just doesn't have support for this. – user2357112 May 13 '15 at 09:27
  • 2
    Also, [here's the source for `numpy.ma.MaskedArray.mean`](https://github.com/numpy/numpy/blob/v1.9.1/numpy/ma/core.py#L4727). You can see that it doesn't have anything in it to support a tuple for `axis`. It looks like it wouldn't be too difficult to add support, perhaps by making [`numpy.ma.MaskedArray.count`](https://github.com/numpy/numpy/blob/v1.9.1/numpy/ma/core.py#L3979) support a tuple `axis`. – user2357112 May 13 '15 at 09:59
  • Thanks @user2357112. In the absence of support for masked arrays, I could *truly* unmask a2, which does not work with `a2[a2.mask]=0`. The following works: `numpy.mean(numpy.array(a2), axis=(0, 1))` (with `a2[a2.mask]=` first if another value than 0 is required to replace the masked ones). Would you post it as an answer I could accept (since you identified the problem), maybe this will be useful for someone else in the future? – ztl May 28 '15 at 12:46
  • That doesn't skip masked values properly, though. Normally, masked values don't count towards either the numerator or the denominator in `mean`; this is not something you can replicate by filling the masked spots unless you already know the mean. – user2357112 May 28 '15 at 17:29

1 Answers1

5

For a MaskedArray argument, numpy.mean calls MaskedArray.mean, which doesn't support a tuple axis argument. You can get the correct behavior by reimplementing MaskedArray.mean in terms of operations that do support tuples for axis:

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    counts = numpy.logical_not(a.mask).sum(axis=axis)
    if counts.shape:
        sums = a.filled(0).sum(axis=axis)
        mask = (counts == 0)
        return numpy.ma.MaskedArray(data=sums * 1. / counts, mask=mask, copy=False)
    elif counts:
        # Return scalar, not array
        return a.filled(0).sum(axis=axis) * 1. / counts
    else:
        # Masked scalar
        return numpy.ma.masked

or, if you're willing to rely on MaskedArray.sum working with a tuple axis (which you likely are, given that you're using undocumented behavior of numpy.mean),

def mean(a, axis=None):
    if a.mask is numpy.ma.nomask:
        return super(numpy.ma.MaskedArray, a).mean(axis=axis)

    sums = a2.sum(axis=axis)
    counts = numpy.logical_not(a.mask).sum(axis=axis)
    result = sums * 1. / counts

where we're relying on MaskedArray.sum to handle the mask.

I have only lightly tested these functions; before using them, make sure they actually work, and write some tests. For example, if the output is 0-dimensional and there are no masked values, whether the output is a 0D MaskedArray or a scalar depends on whether the input mask is nomask or an array of all False. This is the same as the default MaskedArray.mean behavior, but it may not be what you want; I suspect the default behavior is a bug.

user2357112
  • 260,549
  • 28
  • 431
  • 505