combine and aggregate data from structured numpy array to another

Question

I've got an empty structured array:

id_and_orders_type = np.dtype([('id', 'i4'), ('order_values', 'f4', (100,))])
id_and_orders = np.zeros((10,), dtype=id_and_orders_type)

and I've got another structured array with the data to be filled into id_and_orders.

orders_type = np.dtype([('id', 'i4'), ('value', 'f4')])
orders = np.array(((1, 33.2), (2, 37.1), (3, 22.1), (2, 63.9), (3, 93.1)), dtype=orders_type)

what I wanna do now is to map every orders['value'] with its corresponding id in id_and_orders. In a way that id_and_orders would contain orders['id'] with a subarray of the values for that id in orders:

id_and_orders = np.array(((1, (33.2,), (2, (37.1, 63.9), (3, (22.1, 93.1)))

and maybe some would know how to build the size of the subarray id_and_orders['order_values'] dynamically and not fixed sized with 100.

score 0 · Answer 1 · answered Sep 14 '18 at 13:00

0

I recommend using a Pandas DataFrame instead:

df=pd.DataFrame(
    [(1, 33.2), (2, 37.1), (3, 22.1), (2, 63.9), (3, 93.1)],
    columns=['type', 'value']
)
#    type  value
# 0     1   33.2
# 1     2   37.1
# 2     3   22.1
# 3     2   63.9
# 4     3   93.1

which then easily lets you group by type, and e.g. take the sum of values

df.groupby('type').sum()
#       value
# type       
# 1      33.2
# 2     101.0
# 3     115.2

answered Sep 14 '18 at 13:00

Nils Werner

34,832
7
76
98

Thank you! Is it also possible to have the values within a subseries rather than the sum of them? – Andrew Sep 14 '18 at 14:26
No. But what would be the point in that anyways? You're grouping them so you can do processing on them, not just to group them and leave them there. – Nils Werner Sep 14 '18 at 14:31
Because I need to visualize every single value and I want to access the data via the id. It's actually an order_book. – Andrew Sep 14 '18 at 15:09
Again, something you can easily do in Pandas. It is the right tool for the job! – Nils Werner Sep 14 '18 at 20:13

score 0 · Answer 2 · answered Sep 14 '18 at 14:40

0

This is something I'd write using python data structures, and only consider converting some of the data structures to numpy if performance becomes an issue. Numpy allows for fast access on items of uniform type (and shape, for multidimensional data). When data doesn't exactly match that format it's a good idea to consider incorporating lists and dicts.

Rather than an array, just use a list for input:

id_orders = [(1, 33.2), (2, 37.1), (3, 22.1), (2, 63.9), (3, 93.1)]

Then create a dict of orders by id, with each key containing a list of values belonging to that key.

orders = {}
for id, val in id_orders:
    orders.setdefault(id, []).append(val)

Using setdefault returns the current value for a key if the key is present.
If it is not it sets the key as an empty list and returns that. Once orders has been initialized it's simple enough to convert each entry to a numpy array. Do this step last, since numpy arrays don't handle changes in size very well.

orders = {k:array(v) for k, v in orders.items()}

answered Sep 14 '18 at 14:40

user2699

2,927
14
31

Thanks very much, but this is only a small example on a bigger Problem. Actually there are 15 Columns and 50.000 Rows and Sub-Frames. – Andrew Sep 14 '18 at 15:11
15 columns you want to use as an id, or 15 columns you want stored? – user2699 Sep 14 '18 at 15:16
This will work fine even with much larger data than the example. My computer runs 50,000 rows with 1000 unique ids in 16ms. – user2699 Sep 14 '18 at 15:17
Sure, but I've got multi dimensional subframes within these values and there will be the need of multiple and high frequence queries on the data – Andrew Sep 14 '18 at 15:20
And this solution can't handle that because? You're trying to fit numpy into a problem it wasn't designed for. It will be substantially more difficult (and possibly slower) in the end than using the data types that python provides (or a proper database, which seems to be what you're trying to create). – user2699 Sep 14 '18 at 15:27

combine and aggregate data from structured numpy array to another

2 Answers2