7

I am working with GeoPandas and I have two GeoDataframes with the same CRS. One of them contains a geometry column with a polygon geometry, the other one a column with point geometry. I want to check which points are inside the polygon.

Naively I tried

shape.contains(points)

This gave me

>  The indices of the two GeoSeries are different

I do not understand this message. When I check the documentation, it says

We can also check two GeoSeries against each other, row by row. The GeoSeries above have different indices. We can either align both GeoSeries based on index values and compare elements with the same index using align=True or ignore index and compare elements based on their matching order using align=False:

What are these Indices? Why are they checked against each other and not the geometry columns? Online I read, I have to convert my geometries into shapely geometries. But isn't the whole point of using GeoPandas that I can perfrom geographical operations on the data?

I am confused about this. How to check if the geometries in shape contain any of the geometries in points?

four-eyes
  • 10,740
  • 29
  • 111
  • 220

3 Answers3

4

What you are describing is effectively a spatial join. Below example constructs points from lon/lat of cities in UK and then finds which administrational area polygon the city is in. This is an NxM comparison

import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests

# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on

gpd.sjoin(dfp, df_poly)
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Yes, that is what I am looking for. Intuitively I'd go for `contains()`. Any hints on how to count how many points are in the polygon and attach that number to each row in the polygon dataframe? Something like `gpd.sjoin(dfp, df_poly).groupBy('OBJECTID').count()` – four-eyes Oct 20 '21 at 10:14
  • `df_poly.merge(gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points"), left_index=True, right_index=True, how="left")` will put count of points back on polygons geodataframe – Rob Raymond Oct 20 '21 at 10:50
1

What are these Indices?

Simply speaking index is name of row of pandas.DataFrames or entry of pandas.Series. Aligning using indices is useful if you have data which overlap only partially, consider following example: let say you have daily data from two sensors but 2nd was turned on later then you might prepare pandas.DataFrame as follows.

import pandas as pd
s1 = pd.Series({'day1':100,'day2':110,'day3':105,'day4':120,'day5':110})
s2 = pd.Series({'day3':100,'day4':105,'day5':100})
df = pd.DataFrame({'sensor1':s1,'sensor2':s2})
print(df)

output

      sensor1  sensor2
day1      100      NaN
day2      110      NaN
day3      105    100.0
day4      120    105.0
day5      110    100.0

Observe that NaN (denoting unknown value) were inserted for day1 and day2 for sensor2.

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Thanks for the explanation. That means when I get `The indices of the two GeoSeries are different` that the two `dataframes` are of different length? This is at least what `GeoPandas` tells me when I try `shape.contains(points, align = True)`. – four-eyes Oct 20 '21 at 08:57
  • @Stophface if length are different they will be different, but they might different also for same length dataframes, imagine that you have data from sensor1 for day1, day2, day3 and from sensor2 for day2, day3, day4 - lengths are equal but indices are different – Daweo Oct 20 '21 at 08:59
0

The contains operation doesn't do the full NxM comparison. It can check which of the polygons contain a single point. Or if you provide it with two series it will chack whether the first polygon contains the first point, the second polygon contains the second point, etc..

The error you're getting is geopandas trying the second option, but the points and polygons don't align so it can't do the element-wise comparison.

It sounds like you want the full all polygons times all points comparison. You can do that by iterating over all of the points:

point_series = gpd.GeoSeries([Point(1, 2), Point(2, 1)])

polygon_series = gpd.GeoSeries(
    [
        Polygon([(0, 0), (0, 3), (2, 3), (2, 0), (0, 0)]),
        Polygon([(0, 0), (0, 2), (3, 2), (3, 0), (0, 0)]),
    ]
)

pd.concat([polygon_series.contains(point) for point in point_series], axis=1)

Output:

    0       1
0   True    False
1   False   True
Swier
  • 4,047
  • 3
  • 28
  • 52
  • 1
    Yes, I do want to fill `NxM` comparison. In the polygon dataframe are various polygons and in the points dataframe several points. When I run your code `pd.concat([polygons['geometry'].contains(points['geometry']) for point in points['geometry']], axis = 1)` I still get `The indices of the two GeoSeries are different`. But not just once but for every iteration python does. – four-eyes Oct 20 '21 at 09:35
  • The code snippet you shared uses `.contains(points['geometry'])` instead of `.contains(point)`, which would explain the error. – Swier Oct 20 '21 at 12:49