4

I'm looking for some help understanding how to implement a 2-dimensional kernel density method, with a isotropic variance, and a bivariate normal kernel, kind of, but instead of using the typical distance, because the data is on the surface of the earth, I need to use a great-circle distance.

I'd like to replicate this in R, but I can't figure out how to use a distance metric other than the simple euclidean distance for any of the built in estimators, and since it uses a complex method with convolutions to add the kernels. Does anyone have a way to program an arbitrary kernel?

David Manheim
  • 2,553
  • 2
  • 27
  • 42

1 Answers1

5

I ended up modifying the kde2d function from the MASS library. Some significant revision was needed, as is shown below. That said, the code is very flexible, allowing an arbitrary 2-d kernel to be used. (rdist.earth() was used for the great circle distance, h is the chosen bandwidth, in this case, in km, and n is the number of grid points in each direction to be used. rdist.earth requires the "fields" library)

The function could be modified to perform calculations in more than 2d, but the grid gets large very fast in higher dimensions. (Not that it's small now.)

Comments and suggestions on elegance or performance are welcome!

kde2d_mod <- function (data, h, n = 200, lims = c(range(data$lat), range(data$lon))) {
#Data is a matrix: lon,lat for each source. (lon,lat to match rdist.earth format.)
print(Sys.time()) #for timing

nx <- dim(data)[1]
if (dim(data)[2] != 2) 
stop("data vectors have only lat-long data")
if (any(!is.finite(data))) 
stop("missing or infinite values in the data are not allowed")
if (any(!is.finite(lims))) 
stop("only finite values are allowed in 'lims'")
#Grid:
g<-grid(n,lims) #Function to create grid.

#The distance matrix gets large... Can we work around it? YES WE CAN!
sets<-ceiling(dim(g)[1]/10000)
#Allocate our output:
z<-rep(as.double(0),dim(g)[1])

for (i in (1:sets)-1) {
   g_subset=g[(i*10000+1):(min((i+1)*10000,dim(g)[1])),]
   a_matrix<-rdist.earth(g_subset,data,miles=FALSE)

   z[(i*10000+1):(min((i+1)*10000,dim(g)[1]))]<- apply( #Here is my kernel...
    a_matrix,1,FUN=function(X)
    {sum(exp(-X^2/(2*(h^2))))/(2*pi*nx)}
   )
rm(a_matrix)
}

print(Sys.time())
#Un-transpose the final data.
z<-t(matrix(z,n,n))
dim(z)<-c(n^2,1)
z<-as.vector(z)
return(z)
}

The key point here is that any kernel can be used in that inner loop; the downside is that this is evaluated at grid points, so a high-res grid is needed to run this; FFT would be great, but I didn't attempt it.

Grid Function:

grid<- function(n,lims) {
num <- rep(n, length.out = 2L)
gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])

v1=rep(gy,length(gx))
v2=rep(gx,length(gy))
v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))
grid_out<-c(unlist(v1),unlist(v2))

grid_out<-aperm(array(grid_out,dim=c(n,n,2)),c(3,2,1) ) #reshape
grid_out<-unlist(as.list(grid_out))
dim(grid_out)<-c(2,n^2)
grid_out<-t(grid_out)
return(grid_out)
}

You can plot the values using image.plot, with the v1 and v2 matrices for your x,y points:

kde2d_mod_plot<-function(kde2d_mod_output,n,lims) ){
 num <- rep(n, length.out = 2L)
 gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
 gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])

 v1=rep(gy,length(gx))
 v2=rep(gx,length(gy))
 v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
 v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))

 image.plot(v1,v2,matrix(kde2d_mod_output,n,n))
 map('world', fill = FALSE,add=TRUE)
}
David Manheim
  • 2,553
  • 2
  • 27
  • 42
  • In some interval, measured in hours, you can accept your answer. (It doesn't seem to be a drop-in replacement for kde2d since naively running it with the example in MASS does not succeed. I also get an error with `image(grid) Error in image.default(grid) : increasing 'x' and 'y' values expected`) – IRTFM Nov 22 '13 at 20:01
  • It's not a drop in replacement; the MASS library assumes uncorrelated X,Y kernels, which is only true in a very specific case they handle. Also, the image.plot(output,v1,v2) works for me, but only using the v1, v2 matrices from the grid function; I added code for a new function to do this. – David Manheim Nov 22 '13 at 20:04
  • Still get same error with `with(grid[order(grid$x, grid$y), ], image.plot(x,y,z) )`. I guess my question is which object is being plotted. Sorry to be so dense. – IRTFM Nov 22 '13 at 20:08
  • Try the new function. The output from the kde2d_mod is plotted, using grid$x,grid$y as coordinates. – David Manheim Nov 22 '13 at 20:11