How to store by columns in a structured numpy array

Question

I have a list of tuples that look like this:

>>> y
[(0,1,2,3,4,...,10000), ('a', 'b', 'c', 'd', ...), (3.2, 4.1, 9.2, 12., ...), ]

etc. y has 7 tuples, where each tuple has 10,000 values. All 10,000 values of a given tuple are the same dtype, and I have a list of these dtypes as well:

>>>dt
[('0', dtype('int64')), ('1', dtype('<U')), ('2', dtype('<U')), ('3', dtype('int64')), ('4', dtype('<U')), ('5', dtype('float64')), ('6', dtype('<U'))]

My intent is to do something like x = np.array(y, dtype=dt), but when I do that, I get the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not assign tuple of length 10000 to structure with 7 fields.

I understand that this is because dtype is saying that the first value in the tuple must be an int64, the second value must be a string, and so on, and that I only have 7 dtypes for a tuple with 10,000 values.

How can I communicate to the code that I mean that ALL values of the first tuple are int64s, and ALL values of the second tuple are strings, etc.?

I've also tried having y be a list of lists instead of a list of tuples:

>>>y
[[0,1,2,3,4,...,10000], ['a', 'b', 'c', 'd', ...), ...]

etc, and I get an error due to the same reason as above:

>>> x = np.array(y, dtype=dt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'Supplier#000000001'

Any help is appreciated!

Edit: My goal is to have x be a numpy array.

list(zip(*y)) to transpose it into right list of tuples for your `dt` — hpaulj, Jun 05 '18 at 22:09
just for the record, using 1D arrays of tuples of typed data might not be the easiest way to acheive what you are looking for. Why don't you use [structured arrays](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.rec.html), something like a 2D 7*20000 array? — zar3bski, Jun 05 '18 at 22:55
@DavidZarebski part of the project requires that records are stored by column and not by row. Initially, I tried to use structured arrays but realized the format did not fit. — Tingy, Jun 06 '18 at 18:12

hpaulj · Accepted Answer · 2018-06-05T23:35:41.487

Use the zip* idiom to 'transpose' your list of tuples:

In [150]: alist = [(0,1,2,3,4),tuple('abcde'),(.1,.2,.4,.6,.8)]
In [151]: alist
Out[151]: [(0, 1, 2, 3, 4), ('a', 'b', 'c', 'd', 'e'), (0.1, 0.2, 0.4, 0.6, 0.8)]
In [152]: dt = np.dtype([('0',int),('1','U3'),('2',float)])


In [153]: list(zip(*alist))
Out[153]: [(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6), (4, 'e', 0.8)]
In [154]: np.array(_, dt)
Out[154]: 
array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
       (4, 'e', 0.8)], dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

There is also a recarray creator that takes a list of arrays:

In [160]: np.rec.fromarrays(alist,dtype=dt)
Out[160]: 
rec.array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
           (4, 'e', 0.8)],
          dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

There is also a numpy.lib.recfunctions module (import separately) that has recarray, structured array functions.

As commented:

In [169]: np.fromiter(zip(*alist),dt)
Out[169]: 
array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
       (4, 'e', 0.8)], dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

In order to avoid `list` in `np.array(list(zip(...` one can also do this: `np.fromiter(zip(*y), dtype=dt)` — AGN Gazer, Jun 05 '18 at 23:30

score 0 · Answer 2 · answered Jun 05 '18 at 17:45

0

Probably not the most elegant solution, but list comprehension works:

x = [np.array(tup, dtype=typ[1]) for tup, typ in zip(y, dt)]

answered Jun 05 '18 at 17:45

ilja

2,592
2
16
23

Not exactly what I'm looking for, since I want x to be a numpy array. My bad on neglecting to say it. I will edit my post to mention that. – Tingy Jun 05 '18 at 19:00

How to store by columns in a structured numpy array

2 Answers2