3

I'm developing a spatial ranking application using GeoDjango + PostGIS. Basically what it does is that it retrieves all geometries within the query bounding box, computes the similarity score using a custom function I created, and then return the shapes with top-most scores.

Currently the roundtrip time in each query is very slow. Running profiler shows that the bottleneck is from threadsafe.py which are called by GEOSGeometry operations (i.e. intersects, unions, contains, etc.) inside my similarity function. Here is example profiler result from a single query. It looks like the thread-safe nature of GEOSGeometry is what causing the performance issue here. Individually, the operation taking 40ms doesn't seem like a big deal, but because the number of shapes to compare against the query are usually large, i.e. ~1000 shapes, a 40ms-operation adds up to 40 sec.

Therefore, my question is how can I optimize the function to minimize the turnaround time. Some of my initial ideas are:

  1. Turn off / avoid the theadsafety checking of GEOSGeometry, as these objects are transient and are not shared to any other thread. This would be the ideal case, if possible, as the majority of time spent now is in threadsafe.py
  2. Use another geometry API which isn't treadsafe.
  3. Perform spatial operations at PostGIS level instead of object level. This will make the code looks ugly though. (Updates: This option doesn't work. The overhead of SQL queries alone make operation even slower.)

What's your thoughts?

ejel
  • 4,135
  • 9
  • 32
  • 39
  • 1
    I tried using `GDALGeometry` which comes with GeoDjango as the alternative to `GEOSGeometry`. `GDALGeometry` turns out to rely on threadsafe.py, and as a result performs even worse. – ejel Jul 25 '11 at 21:39

2 Answers2

1

We switched to using shapely for our geos operations. It got us around the threadsafe issue.

FYI, shapely uses long,lat and not lat,long like GeoDjango does

Community
  • 1
  • 1
  • As shapely depends on GEOS for its spatial operations, I thought that it should suffer the same problem. Interesting. I'll give it a try. – ejel Sep 18 '11 at 04:32
0

Actually, threadsafe.py is just wrapping each call to the underlying C functions. For a better idea of what your bottlenecks are, look at the cumtime column. See here for a description of the columns: http://docs.python.org/library/profile.html#module-pstats.

arghbleargh
  • 3,090
  • 21
  • 13
  • You're correct that `threadsafe.py` is a wrapper to C functions. I've double checked the profiled result and it's still true that `threadsafe.py` is the bottleneck. I mean, it gets called every time a spatial operation (e.g. intersects, within, union) is called. The time actually spent in these operations are so small comparing to time spent in threadsafe. That's why I was thinking if I could avoid using threadsafety altogether the problem would be solved. – ejel Jul 06 '11 at 06:33