3

I have a set of ~36,000 polygons which represent a partition (~counties) of the country. My python script receives a lot of points: pointId, longitude, latitude.

For each point, I want to send back pointId, polygonId. For each point, looping into all the polygons and use myPoint.within(myPolygon) is quite inefficient.

I suppose the shapely library offers a better way to prepare the polygon so that finding the polygon for a point becomes a tree path (country, region, sub region, ...)

Here is my code so far:

import sys
import os
import json
import time
import string
import uuid

py_id = str(uuid.uuid4())

sys.stderr.write(py_id + '\n')
sys.stderr.write('point_in_polygon.py V131130a.\n')
sys.stderr.flush()

from shapely.geometry import Point
from shapely.geometry import Polygon
import sys
import json
import string
import uuid
import time

jsonpath='.\cantons.json'
jsonfile = json.loads(open(jsonpath).read())

def find(id, obj):
    results = []

    def _find_values(id, obj):
        try:
            for key, value in obj.iteritems():
                if key == id:
                    results.append(value)
                elif not isinstance(value, basestring):
                    _find_values(id, value)
        except AttributeError:
            pass

        try:
            for item in obj:
                if not isinstance(item, basestring):
                    _find_values(id, item)
        except TypeError:
            pass

    if not isinstance(obj, basestring):
        _find_values(id, obj)
    return results


#1-Load polygons from json  
r=find('rings',jsonfile)
len_r = len(r)

#2-Load attributes from json
a=find('attributes',jsonfile)

def insidePolygon(point,json):
        x=0
    while x < len_r :
            y=0
            while y < len(r[x]) :
        p=Polygon(r[x][y])
        if(point.within(p)):
            return a[y]['OBJECTID'],a[y]['NAME_5']
        y=y+1
        x=x+1
    return None,None


while True:
    line = sys.stdin.readline()
    if not line:
        break
    try:
        args, tobedropped = string.split(line, "\n", 2)

        #input: vehicleId, longitude, latitude
        vehicleId, longitude, latitude = string.split(args, "\t")

        point = Point(float(longitude), float(latitude))
        cantonId,cantonName = insidePolygon(point,r)

        #output: vehicleId, cantonId, cantonName
        # vehicleId will be 0 if not found
        # vehicleId will be < 0 in case of an exception
        if (cantonId == None):
            print "\t".join(["0", "", ""])
        else:
            print "\t".join([str(vehicleId), str(cantonId), str(cantonName)])

    except ValueError:
        print "\t".join(["-1", "", ""])
        sys.stderr.write(py_id + '\n')
        sys.stderr.write('ValueError in Python script\n')
        sys.stderr.write(line)
        sys.stderr.flush()
    except:
        sys.stderr.write(py_id + '\n')
        sys.stderr.write('Exception in Python script\n')
        sys.stderr.write(str(sys.exc_info()[0]) + '\n')
        sys.stderr.flush()
        print "\t".join(["-2", "", ""])
maininformer
  • 967
  • 2
  • 17
  • 31
benjguin
  • 1,496
  • 1
  • 12
  • 21
  • 1
    If the solution doesn't have to tied to a specific library, I'd just suggest making a regular 2D grid, where each element is a list of candidate polygons to search. A 200x200 grid could be good starting point reducing the average number of polygons to search to close to 1. – Aki Suihkonen Nov 30 '13 at 08:08
  • @Aswin, I added the code in the question. – benjguin Nov 30 '13 at 08:15
  • See my answer to a similar question at: https://stackoverflow.com/questions/52281055/point-in-polygon-using-shapely/73480672#73480672 – foxtrotmikew Aug 25 '22 at 16:08

2 Answers2

7

Use Rtree (examples) as R-tree index to: (1) index the bounds of the 36k polygons (do this just after reading jsonfile), then (2) very quickly find the intersecting bounding boxes of each polygon to your point of interest. Then, (3) for the intersecting bounding boxes from Rtree, use shapely to use, e.g. point.within(p) to do the actual point-in-polygon analysis. You should see a massive leap of performance with this technique.

Mike T
  • 41,085
  • 18
  • 152
  • 203
4

It's great,

Here is sample code :

polygons_sf = shapefile.Reader("<shapefile>")
polygon_shapes = polygons_sf.shapes()
polygon_points = [q.points for q in polygon_shapes ]
polygons = [Polygon(q) for q in polygon_points]
idx = index.Index()
count = -1
for q in polygon_shapes:
    count +=1
    idx.insert(count, q.bbox)
[...]
for j in idx.intersection([point.x, point.y]):
    if(point.within(polygons[j])):
        geo1, geo2 = polygons_sf.record(j)[0], polygons_sf.record(j)[13]
        break

Thanks

  • Careful of the `break` statement in the loop though. If you have overlapping shapes, and you want to count the number of points in each shape, this will give a smaller count. Just wasted hours debugging the code just to realize that. – Ash Catchem Jul 20 '15 at 19:53