1

Given the following arrays:

name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

How can I combine them into a recarray (or structured array, same thing) that looks like this: [('a', 'b', 'c'), (0.4, 0.5, 0.6), (1.1, 2.1, 3.1), (17.2)]. And where print(arr["name"]) returns ('a', 'b', 'c').

The actual data has a dozen arrays. There is always one array (b) that only has size of one; the others all have the same size, but that size will vary. So, I'm looking for a solution that is extensible to these conditions. Thank you.

a11
  • 3,122
  • 4
  • 27
  • 66
  • What's a "rec array"? – Woodford Feb 24 '23 at 18:21
  • From your question it is not clear whether you specifically need a [NumPy "record array"](https://numpy.org/doc/stable/reference/generated/numpy.recarray.html). – Lover of Structure Feb 24 '23 at 18:32
  • 1
    @LoverofStructure I agree it is not clear that a recarray is needed from the minimum reproducible example, but that is the point of the MRE-- boil it down to the bare bits. A recarray is needed for larger scope, so that is specified in the OP. – a11 Feb 24 '23 at 18:51

2 Answers2

2

Define a dtype:

In [41]: dt = np.dtype([('name','U10'),('val','f'),('alt','f'),('b','f')])

make a zeros array of the desired shape and dtype:

In [43]: arr = np.zeros(3, dt)

Copy the arrays to their respective fields:

In [44]: arr['name']=name; arr['val']=val; arr['alt']=alt    
In [45]: arr['b']=b

And the result:

In [46]: arr
Out[46]: 
array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, 17.2),
       ('c', 0.6, 3.1, 17.2)],
      dtype=[('name', '<U10'), ('val', '<f4'), ('alt', '<f4'), ('b', '<f4')])

That looks different from what you want, but it is a valid structured array. Yours isn't. And access by field name does what you want:

In [47]: arr['name']
Out[47]: array(['a', 'b', 'c'], dtype='<U10')

The b values have been replicated. You can't make a "ragged" structured array:

In [48]: arr['b']
Out[48]: array([17.2, 17.2, 17.2], dtype=float32)

The other answer creates a dict, which gives the same "key" result, but is a distinct structure. But it may be what you really want.

There are some helper functions that create a recarray from a set of arrays, but their action amounts to the same thing. And they (probably) won't work directly with the single element b.

You could make the list of tuples with:

In [53]: from itertools import zip_longest
In [54]: [ijk for ijk in zip_longest(name,val,alt,b)]
Out[54]: [('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, None), ('c', 0.6, 3.1, None)]
In [55]: np.array(_, dt)
Out[55]: 
array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1,  nan),
       ('c', 0.6, 3.1,  nan)],
      dtype=[('name', '<U10'), ('val', '<f4'), ('alt', '<f4'), ('b', '<f4')])

Though the b fill of None/nan may not be what you want.

You could combine the arrays into one object dtype array, but the elements are not accessible by name. That requires a dict:

In [64]: barr = np.array([name, val, alt, b], dtype=object)
In [65]: barr
Out[65]: 
array([array(['a', 'b', 'c'], dtype='<U1'), array([0.4, 0.5, 0.6]),
       array([1.1, 2.1, 3.1]), array([17.2])], dtype=object)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

The following solution produces output that closely matches what you say you desire (but it's not a NumPy record array):

import numpy as np

name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

arr = {}
for var in ['name', 'val', 'alt', 'b']:
    arr[var] = eval(var)

print(arr["name"])

This prints ['a' 'b' 'c']. Note that arr here is a simple dictionary.


An alternative answer using NumPy's numpy.recarray would be the following:

import numpy as np

# initialization
name = np.array(['a', 'b', 'c'])
val = np.array([0.4, 0.5, 0.6])
alt = np.array([1.1, 2.1, 3.1])
b = np.array([17.2])

# processing
b = np.array([b[0]] * len(name))  # make b longer
fields = ['name', 'val', 'alt', 'b']
dt = np.dtype([('name', '<U12')] + list((colname, 'f8') for colname in fields[1:]))
arr = np.array(list(zip(name, val, alt, b)), dt)

print(arr["name"])  # output: ['a' 'b' 'c']

Here, arr evaluates to the following:

array([('a', 0.4, 1.1, 17.2), ('b', 0.5, 2.1, 17.2),
       ('c', 0.6, 3.1, 17.2)],
      dtype=[('name', '<U12'), ('val', '<f8'), ('alt', '<f8'), ('b', '<f8')])
Lover of Structure
  • 1,561
  • 3
  • 11
  • 27