So your object is a dictionary like:
In [49]: dd = {
...: 'text': "I liked this phone.",
...: 'rating': 5.0,
...: 'positive': True
...: }
I could make an object dtype array that contains 5 copies of this dictionary (or similar objects):
In [50]: arrO = np.empty((5,), object)
In [51]: dict(dd)
Out[51]: {'text': 'I liked this phone.', 'rating': 5.0, 'positive': True}
In [52]: for i in range(5):
...: arrO[i] = dict(dd)
...:
In [53]: arrO
Out[53]:
array([{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True}],
dtype=object)
But such an object array is much like a list - both contain pointers to objects elsewhere in memory:
In [54]: [dict(dd) for _ in range(5)]
Out[54]:
[{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True},
{'text': 'I liked this phone.', 'rating': 5.0, 'positive': True}]
Iteration on the list is faster. Most operations on the object array involve iteration, with the exception things like reshape
that don't require access to individual elements.
Another option is to make structured array.
Key to making a structured array is defining a compound dtype
, and providing data in the form of a list of tuples
:
In 3.6 dictionary order is deterministic, so values
gives the desired order:
In [55]: tuple(dd.values())
Out[55]: ('I liked this phone.', 5.0, True)
In [56]: dt = np.dtype([('text','U30'),('rating',float),('positive',bool)])
In [57]: dt
Out[57]: dtype([('text', '<U30'), ('rating', '<f8'), ('positive', '?')])
Make the array with a list of tuples:
In [58]: arrS = np.array([tuple(dd.values()) for _ in range(5)],dtype=dt)
In [59]: arrS
Out[59]:
array([('I liked this phone.', 5., True),
('I liked this phone.', 5., True),
('I liked this phone.', 5., True),
('I liked this phone.', 5., True),
('I liked this phone.', 5., True)],
dtype=[('text', '<U30'), ('rating', '<f8'), ('positive', '?')])
Access fields by name. Note that this is a 1d array (5,) with 3 fields, not a (5,3) array:
In [60]: arrS['rating']
Out[60]: array([5., 5., 5., 5., 5.])
In [61]: arrS['positive']
Out[61]: array([ True, True, True, True, True])
Modifying the values of the fields:
In [62]: arrS['positive'] = [1,0,0,1,0]
In [63]: arrS['rating'] = np.arange(5)
In [64]: arrS
Out[64]:
array([('I liked this phone.', 0., True),
('I liked this phone.', 1., False),
('I liked this phone.', 2., False),
('I liked this phone.', 3., True),
('I liked this phone.', 4., False)],
dtype=[('text', '<U30'), ('rating', '<f8'), ('positive', '?')])
We can do math on the numeric fields:
In [65]: np.sum(arrS['rating'])
Out[65]: 10.0
Using the boolean field as mask:
In [66]: arrS[arrS['positive']]
Out[66]:
array([('I liked this phone.', 0., True),
('I liked this phone.', 3., True)],
dtype=[('text', '<U30'), ('rating', '<f8'), ('positive', '?')])
In [67]: arrS[~arrS['positive']]
Out[67]:
array([('I liked this phone.', 1., False),
('I liked this phone.', 2., False),
('I liked this phone.', 4., False)],
dtype=[('text', '<U30'), ('rating', '<f8'), ('positive', '?')])
Operations on a structured array are faster than ones on the object dtype, though a bit slower than similar ones on a standalone array or fully numeric one.