I have a large collection (~160m pairs) of latitude and longitude values relating to e-scooter locations. Each lat/lon is to 6dp or more and as such many points lie very close to one another. I am trying to prune this dataset so that points within a certain distance (ideally 1m-5m) from one another are combined into a single new point. As well as this I am trying to remove outliers (points that are not close to any other) that occur due to GPS errors. The end goal is to use this pruned structure to create a GNN for route prediction.
I have tried rounding the points to 4dp and hashing them into a hashtable and taking the mean of the points that get hashed into the same slots, to round all points that lie within ~1.1m of each other into a single new point. However, due to outliers being present, this has either caused new points to sit inside buildings or be moved off footpaths and into the middle of roads.