1

Numpy summarizes large arrays, which is convenient when working in an interactive session. Unfortunately, structured arrays and recarrays are not summarized very well by default. Is there a way to change this?

By default, the full array is displayed if there are 1000 or fewer items. When there are more items than this the array is summarized. This can be set with np.set_printoptions(threshold=<number of items to trigger summarization>, edgeitems=<number of items to show in summary>). This works fine for standard datatypes, for example:

np.set_printoptions(threshold=3, edgeitems=1)
print(np.zeros(3))
print(np.zeros(4))

results in

[ 0.  0.  0.]
[ 0. ...,  0.]

However, when more complex datatypes are used, the summarization is less helpful

print(np.zeros(4, dtype=[('test', 'i4', 3)]))
print(np.zeros(4, dtype=[('test', 'i4', 4)]))

[([0, 0, 0],) ..., ([0, 0, 0],)]
[([0, 0, 0, 0],) ..., ([0, 0, 0, 0],)]

The array is summarized, but the sub datatypes are not. This becomes a problem with large arrays using complex datatypes. For instance the array np.zeros(1000, dtype=[('a', float, 3000), ('b', float, 10000)]) hangs up my ipython instance.

There are a couple of workarounds, rather than using the np.array type directly it's possible to subclass and write a custom __repr__. This would work for big projects, but doesn't solve the underlying issue and isn't convenient for quick exploration of data in an interactive python session. I've also implemented a custom filter in my editor that truncates very long console output. This is a bit of a hack and doesn't help when I fire up a python session elsewhere.

Is there a numpy setting I'm unaware of, or a python or ipython setting that could fix this?

Zwedgy
  • 320
  • 2
  • 3
  • 17
user2699
  • 2,927
  • 14
  • 31
  • 1
    I'm not aware of any such controls or customization. GIven the possible diversity it would take a lot of work to develop a general purpose formatter. But if there's anything, I'd expect to find it in `np.core.records` which implements the `recarray` subclass. – hpaulj Jul 24 '18 at 15:43
  • @hpaulj, thanks. That makes sense and indeed I don't see anything in `np.core.records` that looks like it would affect that. – user2699 Jul 26 '18 at 12:29
  • @hpaulj, I've posted an answer with a fairly general formatter. I've been using it for awhile without error now. – user2699 Aug 31 '18 at 15:48

1 Answers1

0

Here's a workaround I've come up with that allows for sensible printing of record arrays.

def count_dtype(dtype):
    """
    dtype : datatype descr (list of strings / tuples, subdtypes rather than dtype object)
    Return total number of elements in array of dtype
    """
    sum = 0
    for name, t, *shape in dtype:
        if isinstance(t, str): ## base datatype
            if shape:
                sum += np.multiply.reduce(shape[0], dtype=np.int64)
            else:
                sum += 1
        else: ## Subarray type
            sum += np.multiply.reduce(shape, dtype=np.int64)*count_dtype(t)
    return sum


def _recarray2string(a, options, separator=' ', prefix=""):
    """
    Create a string representation of a record array
    a : record array
    separator : used by _array2string
    prefix : used by _array2string
    """
    options = np.get_printoptions()
    threshold = options['threshold']
    edgeitems = options['edgeitems']

    size = count_dtype(a.dtype.descr)
    items = np.multiply.reduce(a.shape)
    if size*items > threshold/(2*edgeitems): ## Too big
        if size > threshold: ## subtype is too large - how to handle?
            newopt = options.copy()
            newopt['threshold'] = options['threshold'] // (2*options['edgeitems'])
            def fmt_subtype(r):
                res = []
                for sub in r:
                    if sub.dtype.names is not None:
                        res.append(fmt_subtype(sub))
                    else:
                        res.append(_array2string(sub, newopt, separator=separator, prefix=prefix))
                return separator.join(res)
            return separator.join(fmt_subtype(a[i]) for i in range(edgeitems)) + '\n...\n' + \
                   separator.join(fmt_subtype(a[a.shape[0]-i-1]) for i in range(edgeitems))
        else: ## Subtype is small enough it's sufficient to truncate only sub-dtype
            options = options.copy()
            options['threshold'] = threshold // size
            return _array2string_old(a, options, separator=separator, prefix=prefix)
    else:  ## Print as normal
        return _array2string_old(a, options, separator=separator, prefix=prefix)


def _array2string(a, options , separator=' ', prefix=""):
    """
    monkeypatched print function that allows truncating record arrays sensibly
    """
    if a.dtype.names is not None:
        return  _recarray2string(a, options, separator=separator, prefix=prefix)
    else:
        return _array2string_old(a, options, separator=separator, prefix=prefix)


# Setup monkeypatching
_array2string_old = np.core.arrayprint._array2string
np.core.arrayprint._array2string = _array2string
user2699
  • 2,927
  • 14
  • 31