1

I am using gIntersection to clip a nationwide path network by polygons one at a time from a SpatialPolygonsDataFrame. I am looping through each polygon, clipping the path network, calculating the length of the clipped paths, and saving this to a dataframe called path.lgth:

poly<-readShapePoly("C:\\temp\\polygons.shp")
paths<-readShapeLines("C:\\temp\\paths.shp")


#loop through all polygons clipping lines

path.lgth<-data.frame()

for (i in 1:length(poly)){
  clip<-gIntersection(paths,poly[i,])
  lgth<-gLength(clip)
  vid<-poly@data[i,3]
  data<-cbind(vid,lgth)
  path.lgth<-rbind(path.lgth,data)
  print(i)
}

The vid line just extracts the polygon ID to save in the dataframe with the path length.

My problem is that it takes way too long to do the first polygon (around 12 minutes!). Is there a way to speed this up? I'm not sure what gIntersection does mathematically (is it checking all paths to see if they overlay with the polygon?). I have simplified my paths so they are only one feature.

Thanks for your help.

Tomas
  • 57,621
  • 49
  • 238
  • 373
kazachoo
  • 11
  • 1

3 Answers3

2

since I was facing the same kind of issue, here's my workaround to reduce processing time. (it's a 4 years-old question! I hope that some people -like me- are still facing such problems?)
I'd advise to first select only the line features that are involved in each gIntersection step with the close function 'gIntersects', which returns a logical and which is much faster than gIntersection!

Therefore your code would be like that:

poly<-readShapePoly("C:\\temp\\polygons.shp")
paths<-readShapeLines("C:\\temp\\paths.shp")

#loop through all polygons clipping lines
path.lgth<-data.frame()

for (i in 1:length(poly)){
# FIRST gIntersects to subset the features you need
  LogiSubPaths <- gIntersects(paths, poly[i, )[1,] #The result is a dataframe with one row
  clip <- gIntersection(path[LogiSubPaths,],poly[i, ]) #Subset the features that you need to gIntersect the current polygon 'i'
  lgth <- gLength(clip)
  vid <- poly@data[i,3]
  data <- cbind(vid,lgth)
  path.lgth <- rbind(path.lgth,data)
  print(i)
}`

Worth to give a try on a real dataset to confirm that the output matches your need.

1

If I understood you correctly, you have N polygons, and M paths, right? And for each polygon you want the sum of the paths, right?

Solution 1

Then, first merge all the lines into one feature. Then make intersection at once using byid = TRUE. This way you get rid of the loop:

paths2 <- gLineMerge(paths)
path.crop <- gIntersection(poly, paths2, byid = TRUE)
path.lgth <- gLength(path.crop, byid = TRUE)

You should get the lengths marked by id of the polygons. I am not sure if the ids of the polygons there will be correct - check if they are correct in the path.crop. If not, you need to set the id parameter of gIntersection to the ids of the polygons.

Solution 2

I am not sure if sp::over can be used to make a clever query? This is worth some examining.

Community
  • 1
  • 1
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • Thank you for your reply. I had already merged the paths, but having explored byid=T before doing this it had slipped my mind. I am trying it out now (currently already running for 10 minutes). I'll let you know the outcome! – kazachoo Dec 03 '13 at 14:23
  • So I left the gIntersection (single path feature intersected with 45K polygons, byid=T) running since 2pm yesterday afternoon and it still hasn't finished 20 hours later! This is really delaying me - I've been trying different ways to get this data for 2 weeks! Any other ideas? – kazachoo Dec 04 '13 at 11:15
  • @kazachoo, you have 45000 polygons??? That's a lot... How many time takes to compute it for a single polygon? How many time take 100 polygons? – Tomas Dec 04 '13 at 11:27
  • <10 seconds for 6 polygons, <3mins for 100 polygons. Yes I have 45,000 polygons, so I've been splitting by region and running one per day with the original loop that I wrote and so it took days. Plus I have to do it all again for my controls which are double (90K)! I tried them all at once with your solution above, hoping that if I left it overnight it would complete, but no such luck! – kazachoo Dec 04 '13 at 11:59
  • @kazachoo wow that's a lot of time. Seems that the lines (and maybe polygons too) are very complex? Is there a lot of line segments in those lines?? If yes, you can try [gSimplify](http://www.inside-r.org/packages/cran/rgeos/docs/gSimplify), run on 100 polygons and see how different the result will get (and set the tolerance parameter accordingly). – Tomas Dec 04 '13 at 12:12
0

The first things to do are to avoid reallocating memory on each pass thru the loop.
Instead of path.lgth<-rbind(path.lgth,data) , initialize prior to the loop:

path.lgth<-matrix(nrow=length(poly), ncol=2)

Then inside the loop, dump the cbind (gross overkill) and do

path.lgth[i,] <- c(vid,lgth)

Now as to overall execution time - you didn't say anything about what CPU (and RAM) you have available. But check out Rprof to get an idea which steps are taking most of your time.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • Thank you for your help. Good advice on loop efficiency - I'm a beginner as you can tell! However, I'm sure it is the gIntersection taking up the most time as I tested it on just one polygon (so just running this command) and it takes many minutes. – kazachoo Dec 03 '13 at 14:26