-1

I am visualizing data points from an Uber drives in New South Wales of Australia dataset on a Folium map, but some of the data points were inside the sea instead of being on the mainland, I tried using a polygon data set of the New South Wales of Australia so that I can use GeoDataFrame.sjoin on both data sets with the "contains" predicate, however, some of the points are still outside, how can I solve this?

This is my code

geo_df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.pick_up_lon, df.pick_up_lat), crs="EPSG:4326")
poly_df = gpd.read_file("state.geojson")
join_df = gpd.sjoin(poly_df, geo_df, predicate="contains")

[before applying the polygon] [1]: https://i.stack.imgur.com/9yF82.jpg

[after applying the polygon and the sjoin, you can still see points outside the polygon in the sea] [2]: https://i.stack.imgur.com/f6FgR.jpg

Ali_Khaled
  • 91
  • 8

1 Answers1

2

The operation you want is clip() Clip the points so you only have points within your required boundary. Code below shows this structured in similar way to your code.

import geopandas as gpd
import numpy as np
import pandas as pd

# get some geojson for NSW
poly_df = (
    gpd.read_file(
        "https://github.com/tonywr71/GeoJson-Data/raw/master/suburb-2-nsw.geojson"
    )
    .dissolve()
    .loc[:, ["geometry"]]
)

# some of these points will be in the sea....
r = np.random.RandomState(22)
df = pd.DataFrame(
    {
        "pick_up_lon": r.choice(np.linspace(*poly_df.total_bounds[[0, 2]], 100), 30),
        "pick_up_lat": r.choice(np.linspace(*poly_df.total_bounds[[1, 3]], 100), 30),
    }
)

geo_df = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.pick_up_lon, df.pick_up_lat), crs="EPSG:4326"
)

# remove points outside NSW
geo_df.clip(poly_df).explore()
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Thanks, I will try it once I return to that project in the future, I kinda lost hope in the middle of it and started another project. – Ali_Khaled Oct 04 '22 at 09:12