16

I have a DataFrame containing Chicago addresses which I've geocoded into latitude and longitude values, and then into Point objects (making the DataFrame a GeoDataFrame). A small fraction have been incorrectly geocoded with LatLong values outside of Chicago. I have a shapefile for Chicago's boundary (GeoDataFrame), I want to select all rows where the Points are outside of Chicago's boundary polygon.

It would be easy to select all points within the polygon (via geopandas sjoin function), but I haven't found a good way to select the points not within the polygon. Does one exist?

MattTriano
  • 1,532
  • 2
  • 16
  • 15

1 Answers1

19

If you convert the Chicago boundary GeoDataFrame to a single polygon, eg with:

chicago = df_chicago.geometry.unary_union

then you can use boolean filtering with the within operator to select points within and outside of Chicago:

within_chicago = df[df.geometry.within(chicago)]
outside_chicago = df[~df.geometry.within(chicago)]

using ~ to invert the boolean condition.

Alternatively, you could use the disjoint spatial predicate:

outside_chicago = df[df.geometry.disjoint(chicago)]
joris
  • 133,120
  • 36
  • 247
  • 202
  • 1
    I don't know why but doing `df[~df.geometry.within(chicago)]` gives that all my points are outside my polygon (which is a singe multipoygon BTW), while `df[df.geometry.disjoint(chicago)]` gives me the expected result (in my case, all points are outside the polygon, so my `outside_chicago` is empty). – umbe1987 Apr 09 '21 at 13:27
  • @umbe1987 Did your `df` and `df_chicago` have the same coordinate reference system? If they had different CRSs, the points for one could, for example, be measured in degrees while the points for the other could be in meters from (0,0), which would cause points from one to fall far outside of the bounding polygon from the other. You can check via `df.crs` and `df_chicago.crs`, and if they're different, you can update the geometry in one via `df_chicago = df_chicago.to_crs(df.crs)`. – MattTriano Feb 01 '22 at 18:14