0

I have two perfectly overlapping rasters (same extents and cell size). For every cell in one raster (i.e. for every XY), I would like to determine the Euclidean geographical distance to the closest cell within a given threshold difference between the rasters.

Put another way: raster1 and raster2 measure some variable Z. I have a threshold difference for Z values (t) which constitutes a "matching" value (or "close enough") between raster1 and raster2. For each reference cell in raster1, I need to 1) find all the cells in raster2 with a Z value of abs(Z2-Z1)

Each raster has ~26 million cells, ~10 million of which have non-NA values. I have come up with a non-raster-based work-around for this problem, but only by converting the rasters to XYZ tables/vectors and performing a looping function for each reference cell. This is much too computationally intensive for the data size that I'm dealing with (takes ~10 days to process!). To assist comprehension of my question, however, that code is as follows:

library(SDMTools)
c.in <- asc2dataframe("reference.asc"); names(c.in) <- c("X","Y","Z")
f.in <- asc2dataframe("destination.asc"); names(f.in) <- c("X","Y","Z")

x=c.in$X
y=c.in$Y
c=c.in$Z
f=f.in$Z
dist=vector(length=length(c))
threshold <- 0.01

id <- 1:length(c)
for (i in length(id)) {
  # First, find all rows within the threshold
  t <- id[abs(f-c[i])<threshold]
  # Second, find the distance to the closest row
  dist[i] <- round(sqrt(min((x[t]-x[i])^2+(y[t]-y[i])^2)))
}

library(raster)
dist.rast <- rasterFromXYZ(x,y,dist)
David Roberts
  • 617
  • 1
  • 11
  • 23
  • 1
    This is going to take a while (++cells!). However I advise look at using the function `rasterToPoints()` function in the `raster` package and `nncross` from `spatstat` to do nearest neighbour distance calculations. I wrote a broad outline of how to do this [**here**](http://stackoverflow.com/a/15416923/1478381) which you may or may not find useful. The difference being that in that example I was calculating distance from raster cells to `SpatialLines` but you can do it with other `raster` cells too. (`nncross` for two point patterns [`ppp` objects]). – Simon O'Hanlon Oct 24 '13 at 08:43
  • Thanks for that. I tried the method you suggested and got it to work 90% of the way. But it wasn't a notable improvement in processing time. I did, however, figure out a more efficient table-based approach that reduced processing time by 3 orders of magnitude. It involves using the unique values of raster1 (instead of all cells) to create a table of suitable matches in raster2. Then it measures from each cell to the values listed in the table, based on the raster1 cell value. I can post the whole code if anyone is interested. – David Roberts Oct 24 '13 at 21:07

1 Answers1

1

You could set the values crossing the threshold to NA and then compute the 'as-the-crow-flies' shortest distance using the distance() function alongside the direction() function within the raster package. You'd then have two raster layers or matrices specifying the exact location of each cell (distance and direction), after a small Pythagorean calculation incorporating the raster resolution. For the sake of simplicity, you may want to remove the spatial projection from the rasters beforehand to remove the ellipsoidal component of the distance and direction calculations. They are easily added back later. If all this is too slow, I recommend trying sparseMatrix() or coding it in IDL or MATLAB. If you use RRO, which ships with MKL optimization, that should improve the performance of matrix operations.

You do excellent research by the way :) Say hi to Andreas for me.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33