yaml.dump throws an error on numpy.array type attribute of an object in Python

Question

I'd like my object to compactly print itself out (no loading is needed), so that numpy.array is printed as a regular tuple (in this example). Instead, I see an error message TypeError: data type not understood.

Any idea what causes an error message and (once resolved) how to

class A:
    def __init__(self):
        from numpy import array
        self.a_array = array([1,2,3])

    def __repr__(self):
        from yaml import dump
        return dump(self, default_flow_style=False)

A()

Desired output is something like:

object:A
a_array: 
- 1, 2, 3

Any ideas?

UPDATE: This may work (if implementable): Is there a way to have a yaml representer that replaces any array variable x to its x.tolist() representation?

How is it supposed to display if the array is 2d (or larger). Are you interested in small, almost trivial arrays that fit on a human-readable line, or big ones (1000s of elements)? — hpaulj, Nov 19 '15 at 07:46
I may have larger arrays, but for now, I'd like to understand how to deal with 1D. I can infer the approach to 2D and larger size data :) Thx for clarifying question. — Oleg Melnikov, Nov 19 '15 at 14:22
`x.tolist()` is the easiest way to change the array into something `yaml` knows how to handle. — hpaulj, Nov 19 '15 at 14:36
Thx! See update. If there is a way to call `tolist()` on `array` variables during `yaml.dump`, it should work cleanly (affecting only `array` types and not any other). — Oleg Melnikov, Nov 19 '15 at 18:49
Have you looked at the `yaml` registration bussiness? You can write a function that handles a particular class of object, and register that with the `yaml` module. You could do that with your whole class, and with things like arrays that aren't handled to your satisfaction. I seen that in the `pyyaml` docs, but never implemented it myself. http://stackoverflow.com/a/27196166/901925 — hpaulj, Nov 19 '15 at 18:58
Indeed, I thought this may be a feasible solution, but have limited experience with registration of `yaml` representatives (and, could not find sufficient documentation, in particular, about registering `array` structures). — Oleg Melnikov, Nov 19 '15 at 21:02

hpaulj · Accepted Answer · 2015-11-19T22:45:20.750

Are you interested in generating valid yaml, or just using yaml as a way to display your object? Phrases like 'no load is needed' suggest the latter.

But why focus on yaml? Does it natively handle lists or sequences in the way you want?

If I use tolist to turn an array into a list that yaml can dump, I get:

In [130]: a = np.arange(3)
In [131]: print(yaml.dump({'a':a.tolist()},default_flow_style=False))
a:
- 0
- 1
- 2

In [132]: print(yaml.dump({'a':a.tolist()},default_flow_style=True))
{a: [0, 1, 2]}

I could drop the dictionary part. But either way the list part does not display as:

- 1, 2, 3

I don't see how yaml.dump is any improvement over the default array displays:

In [133]: print(a)
[0 1 2]
In [134]: print(repr(a))
array([0, 1, 2])

For 2d arrays (and arrays that can be turned into 2d), np.savetxt gives a compact display, with fmt options to control the details:

In [139]: np.savetxt('test',a[None,:], fmt='%d')
In [140]: cat 'test'
0 1 2

Here I'm actually writing to a file, and displaying that with system cat, but I could also write to string buffer.

But I can do better. savetxt just writes the array, one row at a time, to the file. I could use the same formatting style directly.

I create a fmt string, with a % specification for each item in a (here a 1d array). Then fmt%tuple(...) formats it. That's just straight forward Python string formatting.

In [144]: fmt = ', '.join(['%d']*a.shape[0])
In [145]: fmt
Out[145]: '%d, %d, %d'
In [146]: fmt%tuple(a.tolist())
Out[146]: '0, 1, 2'

I could add a - and indention, colon, etc to that formatting.

================================

import numpy as np

class A:
    def __init__(self, anArray):
        self.a_array = anArray

    def __repr__(self):
        astr = ['object: %s'%self.__class__]
        astr.append('a_array:')
        astr.append(self.repr_array())
        return '\n'.join(astr)

    def repr_array(self):
        a = self.a_array
        if a.ndim==1:
            a = a[None,:]
        fmt = ', '.join(['%d']*a.shape[1])
        fmt = '- '+fmt
        astr = []
        for row in a:
             astr.append(fmt%tuple(row))
        astr = '\n'.join(astr)
        return astr

print A(np.arange(3))

print A(np.ones((3,2)))

produces

object: __main__.A
a_array:
- 0, 1, 2

for a 1d array, and

object: __main__.A
a_array:
- 1, 1
- 1, 1
- 1, 1

for a 2d array.

=======================================

import yaml
def numpy_representer_str(dumper, data):
    # first cut ndarray yaml representer
    astr = ', '.join(['%s']*data.shape[0])%tuple(data)
    return dumper.represent_scalar('!ndarray:', astr)

def numpy_representer_seq(dumper, data):
    return dumper.represent_sequence('!ndarray:', data.tolist())

yaml.add_representer(np.ndarray, numpy_representer_str)
print (yaml.dump({'a':np.arange(4)},default_flow_style=False))

yaml.add_representer(np.ndarray, numpy_representer_seq)
print (yaml.dump({'a':np.arange(4)},default_flow_style=False))

class A:
    def __init__(self, anArray):
        self.a_array = anArray

    def __repr__(self):
        astr = ['object: %s'%self.__class__]
        astr.append('a_array:')
        astr.append(self.repr_array())
        return '\n'.join(astr)

    def repr_array(self):
        return yaml.dump(self.a_array)
print (A(np.arange(3)))
print (A(np.arange(6).reshape(2,3)))

With the different styles of numpy representer I get print like:

a: !ndarray: '0, 1, 2, 3'   # the string version

a: !ndarray:         # the sequence version
- 0
- 1
- 2
- 3

object: <class '__main__.A'>     # sequence version with 1d
a_array:
!ndarray: [0, 1, 2]

object: <class '__main__.A'>    # sequence version with 2d
a_array:
!ndarray:
- [0, 1, 2]
- [3, 4, 5]

My object has a complex structure (nested classes, dictionaries, basic data types, etc.) `yaml` seems a better solution than `pprint` or user-defined recursive structure print out. However, `yaml` prints out `array` one value at a time (as required for loading), whereas for display-only I need my print out to look compact (it will *not* be loaded later). — Oleg Melnikov, Nov 19 '15 at 18:43
You could still use my array formatting ideas with `yaml`. http://pyyaml.org/wiki/PyYAMLDocumentation#Constructorsrepresentersresolvers — hpaulj, Nov 19 '15 at 18:58
I added a first cut (two actually) at an array representer. I demo it both as standalong yaml dump and as part of your object formatting. — hpaulj, Nov 19 '15 at 22:46
Great solution. When I run it, a FutureWarning comes up from representer. Is this avoidable? `C:\...\yaml\representer.py:135: FutureWarning: comparison to 'None' will result in an elementwise object comparison in the future. if data in [None, ()]:` — Oleg Melnikov, Nov 21 '15 at 03:02
From your comment I can only guess at the location and cause of the warning, and I can't reproduce it My guess is some data value is `None` (but not necessarily the newly added array). — hpaulj, Nov 21 '15 at 03:42
Thanks. The error message was raised from running your code. It doesn't seem to have any explicit None values. Not sure yet what's going on, but I like the solution and accept it :) If you ever discover the source of the error, please let me know. — Oleg Melnikov, Nov 21 '15 at 04:20
There's a line in `SafeRepresenter` that reads `if data in [None, ()]:`, which could end up trying `data==None`, while `data is None` is a better test. I'm not sure why my tests don't raise the warning. May be some sort of version issue. — hpaulj, Nov 21 '15 at 05:46

score 1 · Answer 2 · edited May 23 '17 at 12:34

1

You can marshal your numpy array to a list when representing in your A object. Then unmarshal it when retrieving from your object:

class A:
    def __init__(self):
        from numpy import array
        self.a_lst = [1,2,3]

    def __repr__(self):
        from yaml import dump
        return dump(self, default_flow_style=False)

    # convert internal list to numpy array before returning.
    @property
    def my_arr(self):
        return array(self.a_lst)

    # convert array to list before storing internally.
    @my_arr.setter
    def my_arr(self, array):
        self.a_lst = array.tolist()

print(repr(A()))

The key is to ensure that you are storing the array as a plain python list while inside your object so you can ensure you can do a yaml dump.

A possibly better alternative is to use the built-in dump functionality provided by numpy. See answer here.

edited May 23 '17 at 12:34

Community

1
1

answered Nov 19 '15 at 03:57

Martin Konecny

57,827
19
139
159

Thanks Martin. Actually, I need to store it as `array`, but output it as if it was a list. I thought there may be a `yaml.representer` (or simpler) solution for it. All internal computations would be done on `array`, which itself is a product of internal computations. So, storing `array` would be natural. However, `yaml` does not print it nicely (once error is handled) :( – Oleg Melnikov Nov 19 '15 at 04:13
Assuming your object is storing only `a_array` as a member, you could do the following: `a = A(); dump(a.a_array.tolist(), default_flow_style=False)`. – Martin Konecny Nov 19 '15 at 04:18
That'd be too trivial. In actuality, there are other class-scope structures, including nested class compositions (which also need to be printed). For now, I just wanted to focus on nice-printing `array` :) – Oleg Melnikov Nov 19 '15 at 05:08

yaml.dump throws an error on numpy.array type attribute of an object in Python

2 Answers2