How to circumvent the restriction on field names?

Question

If I define a recarray r with a field called data as follows

import numpy
r = numpy.zeros( 1, numpy.dtype([('data', 'f8')]) ).view(numpy.recarray )

the data field will refer to some internal recarray buffer rather than a floating point number. Indeed, running

r.data

yields

<read-write buffer for 0x7f3c10841cf8, size 8, offset 0 at 0x7f3c1083ee70>

rather than [0]. I suspect the reason for the failure is that recarray already has a member called data and hence it just ignores my field called data. The same problem occurs if I try to use any name of already existing members of recarray.

My questions are:

1) Is it possible to circumvent this limitation of recarray and how to do it?

2) Is this limitation likely to be lifted in the future?

score 3 · Answer 1 · edited May 23 '17 at 10:27

Here is the getattribute method for recarray. Python translates obj.par1 to obj.__getattribute__('par1'). This would explain why the field name has to be a valid attribute name, when used in recarrays.

def __getattribute__(self, attr):
    try:
        return object.__getattribute__(self, attr)  #**
    except AttributeError: # attr must be a fieldname
        pass
    fielddict = ndarray.__getattribute__(self, 'dtype').fields
    try:
        res = fielddict[attr][:2]
    except (TypeError, KeyError):
        raise AttributeError("record array has no attribute %s" % attr)
    obj = self.getfield(*res)
    # if it has fields return a recarray, otherwise return
    # normal array
    if obj.dtype.fields:
        return obj
    if obj.dtype.char in 'SU':
        return obj.view(chararray)
    return obj.view(ndarray)

The ** line explains why obj.data returns the buffer pointer, not your field. Same would apply to 'shape' and 'strides'. This also makes it possible to access array methods. You want the recarray to behave as much like a regular array as possible, don't you?

The field names in a structured array are like the keys of a dictionary, relatively free form (though I've never explored the limits). But in recarray, those names have to function also a attribute names. Attributes names have to be valid variable names - that's a Python constraint.

In https://stackoverflow.com/a/32540939/901925 I quote from the genfromtxt docs:

Numpy arrays with a structured dtype can also be viewed as recarray, where a field can be accessed as if it were an attribute. For that reason, we may need to make sure that the field name doesn’t contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.

Also a tutorial on Python classes says:

Attribute references use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all the names that were in the class’s namespace when the class object was created. https://docs.python.org/2/tutorial/classes.html#tut-object

In my example, I don't care if r fails to behave like an array as long as r.data does. In any case, recarray should throw an exception at construction when the user supplies a dtype with unacceptable field names like data. — Andrey Sokolov, Sep 29 '15 at 05:04

How to circumvent the restriction on field names?

1 Answers1