I'm developing a spatial ranking application using GeoDjango + PostGIS. Basically what it does is that it retrieves all geometries within the query bounding box, computes the similarity score using a custom function I created, and then return the shapes with top-most scores.
Currently the roundtrip time in each query is very slow. Running profiler shows that the bottleneck is from threadsafe.py
which are called by GEOSGeometry
operations (i.e. intersects, unions, contains, etc.) inside my similarity function. Here is example profiler result from a single query. It looks like the thread-safe nature of GEOSGeometry
is what causing the performance issue here. Individually, the operation taking 40ms doesn't seem like a big deal, but because the number of shapes to compare against the query are usually large, i.e. ~1000 shapes, a 40ms-operation adds up to 40 sec.
Therefore, my question is how can I optimize the function to minimize the turnaround time. Some of my initial ideas are:
- Turn off / avoid the theadsafety checking of
GEOSGeometry
, as these objects are transient and are not shared to any other thread. This would be the ideal case, if possible, as the majority of time spent now is inthreadsafe.py
- Use another geometry API which isn't treadsafe.
Perform spatial operations at PostGIS level instead of object level. This will make the code looks ugly though.(Updates: This option doesn't work. The overhead of SQL queries alone make operation even slower.)
What's your thoughts?