2

I try to find and filter the Points in a GeoDataFrame (df1) which are close to Points in a second GDF (df2), and vise versa. I use this piece of code for it:

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

df1 = df1[df1.geometry.isin(ps1)]
df2 = df2[df2.geometry.isin(ps2)]

However, I get an error on the last line: TypeError: unhashable type: 'Point'

But the line above it works like a charm, and the data types of both lines (df1/df2 and ps1/ps2) are exactly the same.

How is that possible? And how can it be solved?

EDIT:

types of variables:

df1         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df1.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps1         :  <class 'list'>
val1        :  <class 'pandas.core.series.Series'>
df2         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df2.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps2         :  <class 'list'>

EDIT 2:

df1.dtypes
Out[301]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

df2.dtypes
Out[302]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

MWE:

import pandas as pd
from pandas import Timestamp
import geopandas as gpd
import numpy as np

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371000):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

df1 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.37896949048437,
  2: 52.378654032960824,
  3: 52.37818902922923},
 'lon': {0: 4.88585622453752,
  1: 4.886671616078047,
  2: 4.886413945242339,
  3: 4.885995520636016},
 'time': {0: Timestamp('2019-11-05 11:31:42+0000', tz='UTC'),
  1: Timestamp('2019-11-05 11:32:22+0000', tz='UTC'),
  2: Timestamp('2019-11-05 11:32:49+0000', tz='UTC'),
  3: Timestamp('2019-11-05 11:33:31+0000', tz='UTC')}})
df2 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.369466977365214,
  2: 52.36923115238693,
  3: 52.36898222465506},
 'lon': {0: 4.88585622453752,
  1: 4.9121331184582,
  2: 4.912723204441477,
  3: 4.913505393878495},
 'time': {0: Timestamp('2019-11-05 08:54:32+0000', tz='UTC'),
  1: Timestamp('2019-11-05 08:55:06+0000', tz='UTC'),
  2: Timestamp('2019-11-05 08:55:40+0000', tz='UTC'),
  3: Timestamp('2019-11-05 08:56:22+0000', tz='UTC')}})

df1 = gpd.GeoDataFrame(df1, geometry=gpd.points_from_xy(df1.lat, df1.lon))
df2 = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2.lat, df2.lon))

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

val1 = gpd.GeoDataFrame(df1)
val2 = gpd.GeoDataFrame(df2)
# print(type(df1))
# print(type(df2))
# print(type(ps1))
# print(type(ps2))
print('df1         : ', type(df1))
print('df1.geometry: ', type(df1.geometry))
print('ps1         : ', type(ps1))
val1 = df1.geometry.isin(ps1)
print('val1        : ', type(val1))

print('df2         : ', type(df2))
print('df2.geometry: ', type(df2.geometry))
print('ps2         : ', type(ps2))
val2 = df2.geometry.isin(ps2)
print('val2        : ', type(val2))
# df1 = df1[df1.geometry.isin(ps1)]
# df2 = df2[df2.geometry.isin(ps2)]
hvd2
  • 35
  • 1
  • 6
  • Assign the index values to variables before indexing into `df1` and `df1`. So `val1 = df1.geometry.isin(ps1)` and `val2 = df2.geometry.isin(ps2)`. Then `print(type(val1))` and `print(type(val2))`. See what they actually are. – Tom Karzes Apr 15 '20 at 13:46
  • Please, try to provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve). – Georgy Apr 15 '20 at 14:36
  • Thanks; I added the result to the original question. – hvd2 Apr 15 '20 at 14:38
  • @Georgy I would like to, but how to include the original data? – hvd2 Apr 15 '20 at 14:44
  • See here: [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/7851470) – Georgy Apr 15 '20 at 14:49
  • Try also printing `df1.dtypes` and `df2.dtypes`. I've had weird bugs where a stray string or NaN can change the type of the column, which changes the logic that certain library functions use. – 0x5453 Apr 15 '20 at 14:59
  • `ps1` and `ps2` lists have zero elements each (in the MWE). Not sure what is expected from the `df1.geometry.isin(ps1)` call then. – Keldorn Apr 15 '20 at 15:22
  • @Keldorn Hm, that was a side-effect of MWE. Edited the data such that ps1 and ps2 are nonzero. – hvd2 Apr 15 '20 at 15:27

1 Answers1

1

As the error says, Point is not hashable (since this?).

It turns out, for a reason I ignore, the pandas.Series.isin function seems to require the data to be hashable. See the question I just posted.

As for your question, a workaround would be to use lists, and convert it again to Series, like:

val2 = pd.Series([v in ps2 for v in df2.geometry])
Keldorn
  • 1,980
  • 15
  • 25
  • The version corresponds with mine (1.6.4.post2). The workaround works, thanks for that! However, I still don't understand why ```df1.geometry.isin(ps1)``` does not fail, while ```df2.geometry.isin(ps2)``` does... – hvd2 Apr 15 '20 at 18:33