Most efficient way to merge lists of objects based on max value of object's property

Question

I want to do the exact same thing as this post, but with a 2d array of objects instead of just a 2d array of numbers.

class ExchangeRate:
    rate = None
    name = None
    otherUsefulProperty = None

    def __init__(self, _rate, _name, _otherUsefulProperty):
        self.rate = _rate
        self.name = _name
        self.otherUsefulProperty = _otherUsefulProperty

This is my object's class. The rate property would be the one that is used when merging the graphs together.

The below code works great with a 2d array of numbers, but I haven't been able to figure out how to do the same thing efficiently with a 2d array of objects. Note, performance is critical in what I'm doing. This is assuming a 2d array of objects is indeed performant. If it's not and there's a more performant way, please let me know.

import numpy as np

graph = np.ndarray(shape=(4, 3, 3), dtype=float, order='F')
graph[0] = [[0, 0, 1], [1, 0, 1], [2, 0, 0]]
graph[1] = [[0, 0, 1], [1, 0, 1], [2, 0, 0]]
graph[2] = [[5, 0, 0], [1, 0, 1], [2, 0, 0]]
graph[3] = [[2, 1, 0], [9, 0, 1], [0, 0, 0]]

PrintAndLog("graph of type " + str(type(graph)) + " = \n" + str(graph))
PrintAndLog("\n\n")
resultGraph = graph.max(axis=0)
PrintAndLog("resultGraph of type " + str(type(resultGraph)) + " = \n" + str(resultGraph))

Output:

graph of type <class 'numpy.ndarray'> = 
[[[ 0.  0.  1.]
  [ 1.  0.  1.]
  [ 2.  0.  0.]]

 [[ 0.  0.  1.]
  [ 1.  0.  1.]
  [ 2.  0.  0.]]

 [[ 5.  0.  0.]
  [ 1.  0.  1.]
  [ 2.  0.  0.]]

 [[ 2.  1.  0.]
  [ 9.  0.  1.]
  [ 0.  0.  0.]]]


resultGraph of type <class 'numpy.ndarray'> = 
[[ 5.  1.  1.]
 [ 9.  0.  1.]
 [ 2.  0.  0.]]

Possible final solution:

Hopefully someone else finds this useful. I just did a ton of performance testing and there is a clear winner.

TLDR: Winner = np.array + ExchangeRate object + graph.max(axis=0). This is by far the fastest way I've tried. And again, the goal is to merge many rates graphs, where each rate also has metadata associated with it that needs to merge along with it

Here are the test results from the final methods I narrowed it down to. Each test was timed based on merging only (where I merge several graphs into one based on rate). I recorded the following data points: average duration over 200 runs, and the duration of the very first run. The very first run was important because it seemed to sometimes be longer than the average. It may have to do with caching.

test_graph_4 200 run average = 0.0003026 seconds (first run, ~same)
test_graph_3 200 run average = 0.0003836 seconds (first run, ~same)
test_graph_2b 200 run average = 0.000018 seconds (first run 0.000092 seconds)
test_graph_2a 200 run average = 0.000066 seconds (first run 0.000143 seconds)

Construct a working example with your class, and show what you've manged to do, efficient or not. It's more work if I have to construct the lists of objects etc. to illustrate any ideas. However past answers have shown that performance of arrays of objects is similar to list of objects. Neither "reaches inside" your objects with compiled code. — hpaulj, Nov 14 '20 at 07:50
My current implementation does not use objects nor numpy. I'm currently just using python lists to make a 2dlist then manually merging them. I'm also not using objects either, i'm using separate 2d lists for rate, name, and otherUsefulProperty. So when I merge, I just have to brute force iterate over all of them and find the max rate value and then update the final merged 2d list. It's pretty computationally expensive. — LampShade, Nov 14 '20 at 14:45
I was hoping I could use the numpy function `graph.max(axis=0)` and have it somehow reach into my object. If I cannot do that, then maybe my brute force merging method is the most optimal approach? And to confirm, you're saying you've seen python lists showing similar performance to numpy arrays? I was also wondering if I were to keep my brute force merge approach and just switch over from python lists to numpy arrays if that would offer me any performance boost. — LampShade, Nov 14 '20 at 14:48
I tested a variety of different solutions for this, timed them, and posted code + results above. — LampShade, Nov 14 '20 at 19:38

score 1 · Accepted Answer · answered Nov 14 '20 at 16:00

1

I believe you only need to implement __lt__ method:

class ExchangeRate:
    rate = None
    name = None
    otherUsefulProperty = None

    def __init__(self, _rate, _name, _otherUsefulProperty):
        self.rate = _rate
        self.name = _name
        self.otherUsefulProperty = _otherUsefulProperty
    
    # this method allows sorting, comparison, etc.
    def __lt__(self, other):
        return self.rate < other.rate

a = np.array([ExchangeRate(3,2,1)])

b = np.array([ExchangeRate(1,2,3)])

a>b

Out:

True

answered Nov 14 '20 at 16:00

Quang Hoang

146,074
10
56
74

This works. I had to use `__ge__` instead of `__lt__ ` in order to work with `graph.max(axis=0)`, but it totally works. Now I'm going to do some testing to see if it's actually more performant than my current brute force method. – LampShade Nov 14 '20 at 17:53
I updated the OP with the current solution that does indeed work, but I'm not sure exactly how to make it the most performant. I'm going to try a few things. I read this which seems related to the topic in terms of performance: https://stackoverflow.com/a/11233356/1558119 – LampShade Nov 14 '20 at 18:04
I did tons of testing and updated the OP. Found the best performance way I could come up with and it does indeed include an `np.array` + `__ge__` – LampShade Nov 14 '20 at 19:37

Most efficient way to merge lists of objects based on max value of object's property

1 Answers1