5

There is something that I would very appreciate, it is the ability to name the dimensions in an array in python. For example I have a numpy array with 3 dimensions and I will regularly have to sum it along a specific dimensions.

So I can do with a ndarray a:

sum(a, axis=2)

if my relevant dimension is the last one, but I want to make it "position independent", i.e. a user can provide any array, as long as he specifies "this dimension is "DI" " (example, for "Dimension of Interest"). So basically I would like to be able to write:

sum(a, axis="DI")

Close to NETCDF, but I don't want to implement a whole netcdf capability.

M456
  • 5,547
  • 2
  • 19
  • 14
François Laenen
  • 171
  • 4
  • 14
  • 4
    Greetings and thanks aren't needed. In fact, you can even see the regular expression which was at one point (it's probably improved now) used to remove them [here](http://meta.stackexchange.com/a/93989/163205). – DSM May 06 '13 at 15:55
  • The best way I can think of is to maintain a dictionary with mappings from names to axis numbers. Either that or use nested dicts with arrays at the bottom, but that's probably more trouble than it's worth (and not kosher in numpy as far as I know). – Henry Keiter May 06 '13 at 16:34
  • 1
    Do you really need `DI` to be a string? If you let `DI = 2` somewhere in your code you'd be able to do `np.sum(a, axis=DI)`... – jorgeca May 06 '13 at 19:35

2 Answers2

3

You can write a thinly wrapped subclass to np.ndarray. But maintaining the correspondence between dimensions and the names can be tricky.

class NamedArray(np.ndarray):
    def __new__(cls, *args, **kwargs):
        obj = np.ndarray(args[0], **kwargs).view(cls)
        return obj

    def __init__(self, *args, **kwargs):
        self.dim_names = None
        if len(args) == 2:
            self.dim_names = args[1]

    def sum(self, *args, **kwargs):
        if (self.dim_names is not None) and (type(kwargs['axis']) == str):
            axis_name = kwargs.pop('axis')
            axis_ind = self.dim_names.index(axis_name)
            kwargs['axis'] = axis_ind
        return super().sum(*args, **kwargs)

#regular ndarray
a = NamedArray([1,2,3], dtype=np.float32)

#ndarray with dimension names
b = NamedArray([1,2,3], ('d1', 'd2', 'd3'), dtype=np.float32)

Edit: Pandas DataFrame nowadays is a pretty close thing to what the OP asked.

M456
  • 5,547
  • 2
  • 19
  • 14
3

@M456's idea is clever, but if you have the same naming scheme for several arrays, I think the simpler solution would be just to use a dictionary:

axes = { 'DA': 0, 'DB':1 }
a.sum(axes['DA'])

or even just variables:

DA, DB, DC = range(3)
a.sum(DA)

If it should be your last (or penultimate, etc) axis, just use -1 (or -2, etc.):

a.shape
#(2,3,4)

np.all(a.sum(2) == a.sum(-1))
#True
np.all(a.sum(0) == a.sum(-3))
#True
askewchan
  • 45,161
  • 17
  • 118
  • 134
  • Simple and elegant! Several of you have proposed this solution, I should have thought about it too. Yes the idea @M456 is nice, but I will choose the simplest one! Thank you! – François Laenen May 07 '13 at 18:24