22

What's the best way to convert numpy's recarray to a normal array?

i could do a .tolist() first and then do an array() again, but that seems somewhat inefficient..

Example:

import numpy as np
a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])

>>> a
  rec.array([(30408891, 9.2944097561804909e-296, 30261980),
   (44512448, 4.5273310988985789e-300, 29979040)], 
  dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

>>> np.array(a.tolist())
   array([[  3.04088910e+007,   9.29440976e-296,   3.02619800e+007],
   [  4.45124480e+007,   4.52733110e-300,   2.99790400e+007]])
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
Muppet
  • 5,767
  • 6
  • 29
  • 39
  • 1
    You aren't getting any answers because we don't understand your question. Try to reword your question, and include any relevant code. – Steven Rumbalski Oct 20 '11 at 21:14
  • 4
    To the down-voters I ask that you be a little more patient. This is a person who hasn't asked questions here before and hasn't had much time to revise the question. If the question stays in this poor form for too long, by all means down-vote it. – Steven Rumbalski Oct 20 '11 at 21:17
  • ok sorry guys, added an example. is this clearer? – Muppet Oct 20 '11 at 21:25

2 Answers2

18

By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:

>>> a = np.array([(0, 1, 2),
              (3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)], 
      dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:

>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0.,  1.,  2.,  3.,  4.,  5.])

astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a. Each row of a requires 4+8+4=16 bytes, while a.astype(...) requires 8*3=24 bytes. Calling view requires no new memory, since view just changes how the underlying data is interpreted.

a.tolist() returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist() requires more memory than a.astype(...).

Calling a.astype(...).view(...) is also faster than np.array(a.tolist()):

In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)

In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop

In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • You may need to ensure that the array is contiguous: np.ascontiguousarray(a, [('x', ' – lib Apr 15 '15 at 07:01
3

Here is a relatively clean solution using pandas:

>>> import numpy as np
>>> import pandas as pd
>>> a = np.recarray((2,), dtype=[('x', int), ('y', float), ('z', int)])
>>> arr = pd.DataFrame(a).to_numpy()
>>> arr
array([[9.38925058e+013, 0.00000000e+000, 1.40380704e+014],
       [1.40380704e+014, 6.93572751e-310, 1.40380484e+014]])
>>> arr.shape
(2, 3)
>>> arr.dtype
dtype('float64')

First the data from the recarray are loaded into a pd.DataFrame, then the data are exported using the DataFrame.to_numpy method. As we can see, this method call has automatically converted all of the data to type float64.

Jasha
  • 5,507
  • 2
  • 33
  • 44