0

I am currently working with a large dataset containing coordinates, and I want to validate the quality of the data.

The df contains coordinates from all over Europe. To valid the quality, I want to calculate the deviation to the nearest road since the data are sent from vehicles.

What I have done so far is downloaded the europe.osm file and cleaned it to contain roads only. The filtered osm file is about 2GB. (used osmfilter)

Next I want to use osmnx to create a graph from the file:

import osmnx
import os

G = osmnx.graph.graph_from_xml('../europe-roads.osm', simplify=True, retain_all=False)

Here starts my first problem. It seems that osmnx can't handle a file by this size. It only works with smaller files (cities).

What I want to do in the end is using the get_nearest_edge() function to calculate the distance to nearest edge.

orig_edge = osmnx.distance.get_nearest_edge(G, (52.393214, 13.133295),return_geom=False, return_dist=True)

My idea now was to drop all nodes in the osm file that would bring it down to about half the size (since I only need the edges). However, I cant create a graph from an osm file with the nodes removes.

Any ideas on how to solve this problem?

In the end, what counts is that I have a solution to measure the distance to the nearest road from coordinates all over Europe.

YannickAaron
  • 157
  • 10
  • 1
    How about splitting the data, one file per geohash for example? Then your code could check what geohashes are closest to the location, load those graphs, run the nearest edge function, and then let you choose the best. If the geohashes are small enough, the files would be fairly fast to load.. – matthieun Sep 30 '20 at 17:01
  • Can't imagine how it would work! Could you provide some more informations? – YannickAaron Sep 30 '20 at 17:18
  • I figured it out. Thanks a lot! – YannickAaron Oct 01 '20 at 06:32
  • Could you say how you did it? – Scipio Mar 20 '23 at 23:47

0 Answers0