0

I'm trying to do some GIS work using R. Specifically, I have a spatialpointsdataframe (called 'points') and a spatiallinesdataframe (called 'lines). I want to know the closest line to each point. I do this:

# make a new field to hold the line ID
points@data$nearest_line <- as.character('')

# Loop through data. For each point, get ID of nearest line and store it
for (i in 1:nrow(points)){
  points@data[i,"nearest_line"] <-
    lines[which.min(gDistance(points[i,], lines, byid= TRUE)),]@data$line_id
}

This works fine. My issue is the size of my data. I've 4.5m points, and about 100,000 lines. It's been running for about a day so far, and has only done 200,000 of the 4.5m points (despite a fairly powerful computer).

Is there something I can do to speed this up? For example if I was doing this in PostGIS I would add a spatial index, but this doesn't seem to be an option in R.

Or maybe I'm approaching this totally wrong?

TheRealJimShady
  • 777
  • 3
  • 9
  • 24
  • 1
    Don't go with a point at the time; rather go chunk by chunk. For instance, you can try `gDistance(points[1:10000,],lines,byid=TRUE)` and col `max.col` on the result to get the closest one. Then, you go with the second chunk and so on. – nicola Jan 19 '17 at 15:45
  • Hi @nicola. Thanks for your suggestion. Would you mind demonstrating how this would work? I don't quite understand. If I do this with the first 10 lines and the first 10 points, I end up with a distance matrix of 10x10. I'm not sure that's useful as I need to know the ID of the street that is closest to each point. Not the distance? – TheRealJimShady Jan 19 '17 at 16:05
  • I am working on similar requirement and think it will take same amount of time as you. Have you got it solved? Any update on this @nicola – ds_user Aug 15 '17 at 00:08

0 Answers0