0

The minimal demonstrative example what I am trying to do is as follows: Suppose I have a list of instances of a point class, resembling points in the xy-plane, defined by:

class point:
    def __init__(self,x_value, y_value):
        self.x = x_value
        self.y = y_value

r1 = point(0,1)
r2 = point(2,1)
r3 = point(5,6)

all_points = [r1,r2,r3]

Now I want to construct a subset of this list, which only contains those elements which satisfy certain conditions on self.x and self.y, say self.x < 4 and self.y < 4, so the result of the process should give me the list subset = [r1,r2]. Side-note: The order is actually not important, and the elements are unique, so I could in principle also deal with sets instead of lists – not sure if that could be beneficial.

(EDIT: The below is already an improvement over my original code, based on the answers given so far. But I would like to know if I can make it even faster.) What I currently do is

subset = [r for r in all_points if r.x < 4 and r.y < 4]

Which works, but since in my actual case this for loop goes over 60000 entries, and I have to do this pruning many many times, it ends up taking longer than I can afford it to take for my application.

So my question is: Is there a way to prune the list faster? Thanks!

Britzel
  • 205
  • 3
  • 8
  • combine all your pruning tests in 1 function and just traverse the list once – rioV8 Jul 07 '23 at 12:35
  • @rioV8 Thanks for your reply! Yes! But how do I do that. That is precisely my question, just paraphrased. : ) (If I understand your statement correctly.) – Britzel Jul 07 '23 at 12:37
  • you only give 1 test, so it is hard for us to guess what your code looks now, give 2 or 3 test examples – rioV8 Jul 07 '23 at 12:38
  • @rioV8 If with "one test" you mean the above for loop, then I need this one test to be faster. I cannot do all tests, since further tests depend on outcomes of calculations. But all tests will be of that nature, so if I can make this one test faster, I can also make all future tests faster. But there is no way to do all tests at once. – Britzel Jul 07 '23 at 12:43
  • Does the content of the objects change between your "pruning" runs ? if not (i.e. only the criteria changes), there may be something that can be done with indexing and sets. – Alain T. Jul 07 '23 at 13:12
  • @AlainT. With content, do you mean the attributes which are used in the pruning conditions? Most of them stay the same actually, there is only one which changes. So it could help! Could you give me a brief outline of your idea please? : ) What will change for all is the precise condition imposed at each pruning. – Britzel Jul 07 '23 at 13:17
  • Multithreading? For example, if you have 8 cpu cores, create 8 threads with each one working on a part of the original list. Then combine the results. – 001 Jul 07 '23 at 13:26
  • My idea would be to setup various dictionaries with an attribute value as key and the set of corresponding objects as value. Getting the subset of objects that meet the condition could be done by intersecting the sets of object obtained from the dictionary of attribute values. You would only setup the dictionaries once and then be able to check various conditions. This may only provide improvements based on how distinctive the attribute values are (so data dependent). We would need a better understanding of the data to improve on list comprehensions – Alain T. Jul 07 '23 at 13:27
  • @AlainT. So investing in precomputing to then be able to fast lookup? This could work! I will explore this idea a bit, thanks for the hint! – Britzel Jul 07 '23 at 13:38
  • @001 Sounds like it should be an option indeed! I have no experience with multithreading yet, but will look into it, thanks! – Britzel Jul 07 '23 at 13:39
  • I tried multithreading and it was actually slower than list comprehension. Probably due to the overhead of setting up the threads, creating the sub-lists and merging the sub-lists. – 001 Jul 07 '23 at 14:46
  • Python's multi-threading is closer to cooperative multitasking than actual threads because of global interpreter locks. You don't really get parallel execution of code. It is beneficial when some of the threads need to wait for an I/O (which lets other threads get full CPU power during the wait) otherwise you would need to use multiprocessing but then you need to communicate data between process which could be just as prohibitive. – Alain T. Jul 07 '23 at 15:42

4 Answers4

4

usually a list comprehension might be faster

subset = [r for r in all_points if r.x < 4 and r.y < 4]
rioV8
  • 24,506
  • 3
  • 32
  • 49
  • Thanks a lot! This is faster by about 15% in the current example indeed (if the loop goes over 60k elements). Is this all I can squeeze out? I was more hoping I could improve by an order of magnitude. – Britzel Jul 07 '23 at 13:03
2

To illustrate the benefits of "preparing" indexing structures, a sorted list of values can be used to quickly get a sub-range of objects that meet a first criterion, using binary search, yielding a smaller set that you can then intersect with other criteria. Depending on the nature of the attributes and the type of criterion (ranges, individual or multiple values) binary searches or hash indexing can produce great preformance boosts:

Here is an example using a binary search to select eligible points on the x attribute which are then filtered sequentially on the y criterion:

from random import randrange
class point:
    def __init__(self,x_value, y_value):
        self.x = x_value
        self.y = y_value
        
points = [point(randrange(100),randrange(100)) for _ in range(60000)]

xOrder = sorted(points,key = lambda p:p.x) # points in `x` order
xs     = [p.x for p in xOrder]             # x values for binary search

from bisect import bisect_left

seqFilter = lambda: [p for p in points if p.x<4 and p.y<4]
binFilter = lambda: [p for p in xOrder[:bisect_left(xs,4)+1] if p.y < 4]

output:

from timeit import timeit

print("same output:    ", set(seqFilter())==set(binFilter())) 
print("sequential time:", timeit(seqFilter,number=1000))
print("binsect time:   ", timeit(binFilter,number=1000))

same output:    True
sequential time 2.55
binsect time    0.18  # 14 times faster

Note: this performance improvement is conditioned on the distinctiveness of the first criteria. In my sample data x<4 represents roughly 1/25th of the points to check. Using the most restrictive criterion as the main filter, with the others applied sequentially, may be sufficient depending on your data and criteria values.

For criteria that don't involve a lot of different values of a given attribute (e.g. x==4 and y==4), set intersections may also be interesting to look into:

xDict = {n:set() for n in range(101)}
yDict = {n:set() for n in range(101)}
for p in points:
    xDict[p.x].add(p)
    yDict[p.y].add(p)

seqFilter  = lambda: [p for p in points if p.x == 4 and p.y == 4]
binFilter  = lambda: [p for p in xOrder[bisect_left(xs,4):bisect_right(xs,4)] 
                      if p.y == 4]
dictFilter = lambda: xDict[4] & yDict[4] 

output:

print("same output:    ", set(seqFilter())==dictFilter())
print("sequential time ", timeit(seqFilter,number=1000))
print("binsect time    ", timeit(binFilter,number=1000))
print("dictFilter time ", timeit(dictFilter,number=1000))

same output:    True
sequential time 2.458
binsect time    0.0388   # 63 times faster
dictFilter time 0.00623  # 394 times faster

Yet another way to improve performance is to use numpy as the filtering mechanism. numpy is generally faster than list comprehensions and doesn't require sorting. Another benefit of this approach is that it is not data dependent and will give very regular response time, for a given data set sise, independently of the distribution of attribute values:

import numpy as np

pts   = np.array(points)
npxs  = np.array([p.x for p in points])
npys  = np.array([p.y for p in points])

seqFilter  = lambda: [p for p in points if p.x < 4 and p.y < 4]
binFilter  = lambda: [p for p in xOrder[:bisect_left(xs,4)+1] if p.y < 4]
npFilter   = lambda: pts[(npxs<4)&(npys<4)]

output:

print("same output:   ", set(seqFilter())==set(npFilter()))
print("sequential time", timeit(seqFilter,number=1000))
print("binsect time   ", timeit(binFilter,number=1000))
print("npFilter time  ", timeit(npFilter,number=1000))

same output:    True
sequential time 2.55
binsect time    0.18  # 14 times faster
npFilter time   0.07  # 36 times faster
Alain T.
  • 40,517
  • 4
  • 31
  • 51
0

you can use list comprehension which is always faster than looping like you are doing right now. Here is a code sample :

import random
class point:
    def __init__(self,x_value, y_value):
        self.x = x_value
        self.y = y_value
# Sample init
all_points = [point(random.randrange(0,10),random.randrange(0,10)) for i in range(60000)]



# List comprehension
subset = [r for r in all_points if r.x <4 and r.y<4]

I hope this will help you

Hamidou
  • 65
  • 6
  • Thanks a lot! This is faster by about 15% in the current example indeed. Is this all I can squeeze out? I was more hoping I could improve by an order of magnitude. – Britzel Jul 07 '23 at 13:04
  • You can try to use the pandas module. Create a Dataframe which contains all your points. Then you can use the "query" function to find your points. It might be faster as pandas is made to deal with big datas. – Hamidou Jul 07 '23 at 13:39
0

You must minimize the number of points you are testing.

For this use a technique from 2D/3D graphics called Culling.

You create a Quad tree that splits your coordinate range of points in 4, and each level again in 4, or split only if you have more than N points in a box.

Each node has a bounding box that encloses all the points containing in this node or all its sub nodes.

You apply your test to the bounding box of a node and determine if some points might pass the test, if not you can skip all the (sub) nodes.

You fill the empty Quad tree point for point, root node bounding box encloses all the points, add points to a node and maybe split the node in 4 if you reach N points. This creates a fine mesh if your points are clustered, other nodes have large bounding boxes and contain only a few points.

You can do the same in 3D and then use an Oct tree, split a BBox in 8 regions.

It usually is done with triangles so in that case the BBoxes can overlap slightly, or a Bounding Sphere is used, easier testing if you need to inspect/process the items in the box.


You can also look into a data structure called a Range Tree or R-Tree

rioV8
  • 24,506
  • 3
  • 32
  • 49