6

I have point data off the coast of Oahu. Someone else used those same data to create a large polygon. I believe he first created a heatmap using a quartic (biweight) kernel with a radius of 1 km around each point and perhaps a 1 km-square pixel size. He cited Silverman (1986, p. 76, equation 4.5, which I believe refers to the book “Density Estimation for Statistics and Data Analysis”). I believe he converted his heatmap into his polygon. I am attempting to approximate his polygon with fake data using R and Windows 10. I can come close using the kde function in the ks package (see figure below). But that package only includes Gaussian kernels. Is it possible to create a similar polygon using a quartic kernel?

enter image description here

The other analysist actually created two version of the polygon. The border of one was labeled “> 1 per km density”; the border of the other was labeled “> 0.5 per km density”. I do not know whether he used R, QGIS, ArcGIS or something else. I was unable to create a single large polygon in QGIS and do not have ArcGIS.

Thank you for any suggestions on how to create a polygon similar to the one shown but using a quartic kernel instead of a Gaussian kernel. If I can provide additional information please let me know.

Here is a link to my fake data in CSV and QGIS format: enter link description here (EDIT: Hopefully anyone can access the fake data now. I could before but I guess others could not.)

1. fake_points_oahu.csv

     a. raw data

2. fake_points_oahu_utm (.shp, .dbf, .prj, .shx) 

     a. vector point layer 

3. fake_points_oahu_June11_2021.png

     a. the figure shown above

Here is my R code:

setwd('C:/Users/mark_/Documents/ctmm/density_in_R/density_files_for_StackOverflow/')

library(sf) # to read shapefile
library(ks) # to use kde function

my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
head(my.data)

# Import shapefile
st_layers("fake_points_oahu_utm.shp")

points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')
st_crs(points_utm)
plot(points_utm)

my.matrix <- as.matrix(my.data[,2:3])
head(my.matrix)

# This uses the Guassian kernel
my_gps_hpi <- Hpi(x = my.matrix, pilot = "samse", pre = "scale")

my.fhat <- kde(x = my.matrix, compute.cont = TRUE, h = my_gps_hpi,
               xmin = c(min(my.data$longitude), min(my.data$latitude)),
               xmax = c(max(my.data$longitude), max(my.data$latitude)),
               bgridsize = c(500, 500))

my.contours <- c(96.5)

contourLevels(my.fhat, cont = my.contours)
contourSizes(my.fhat, cont = my.contours, approx = TRUE)

plot(my.data$longitude, my.data$latitude)
plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)

png(file="fake_points_oahu_June11_2021.png")

     plot(my.data$longitude, my.data$latitude)
     plot(my.fhat, lwd = 3, display = "filled.contour", cont = my.contours, add = TRUE)

dev.off()
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • Your sample data is not accessible. – kwes Jun 13 '21 at 18:43
  • 3
    @kwes I now have used a copy link feature on Google Drive and selected to share with anyone who has that link. Then I copied that link here. Hopefully you can access the fake data now. – Mark Miller Jun 13 '21 at 20:42

1 Answers1

4

You can perform your estimation by slightly modifying the kde2d function from the MASS package. To my knowledge there is currently no package in R that implements bivariate KDE estimation with a quartic (biweight) kernel for a bivariate case.

The univariate biweight kernel can be extended to a multivariate kernel in several ways and the simplest is to just use a product kernel, where you use the univariate kernel for each of your dimensions and then multiply the result. You can find the mathematical expression for the biweight product kernel here. When you incorporate this kernel into the kde2d density estimator from the MASS package it looks as follows

kde_biweight_kernel <- function(x,y, bw_x, bw_y, xrange, yrange){
  # This function is based on the kde2d function from 
  # the MASS package. The only difference is that the Gaussian
  # kernel is substituted with a biweight product kernel
  
  # product kernel:
  biweight_kernel <- function(u){
    mask = abs(u) > 1
    kernel_val = (15/16)*((1-u^2)^2)
    kernel_val[mask] = 0
    return(kernel_val)
  }
  
  lims = c(xrange, yrange)
  n = 500
  nx <- length(x)
  n <- rep(n, length.out = 2L)
  # get grid on which we want to estimate the density
  gx <- seq.int(lims[1L], lims[2L], length.out = n[1L])
  gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
  
  # inputs to kernel
  ax <- outer(gx, x, "-" )/bw_x
  ay <- outer(gy, y, "-" )/bw_y
  
  # evaluate and multiply kernel results along both axes
  res = tcrossprod(biweight_kernel(ax), biweight_kernel(ay))/(nx * bw_x * bw_y)
  return(list(x = gx, y = gy, z = res))
}

Using the kde_biweight_kernelfunction you can compute your desired density as follows

library(MASS)
library(birk)
library(kedd)
library(sf)
library(ks)


# load data
my.data <- read.csv("fake_points_oahu.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
# Import shapefile
st_layers("fake_points_oahu_utm.shp")
points_utm <- st_read(dsn = "fake_points_oahu_utm.shp", layer = 'fake_points_oahu_utm')

x = my.data$longitude
y = my.data$latitude

# determine bandwidth for biweight kernel along both axes
bw_x = h.amise(x, deriv.order = 0, kernel = "biweight")$h
bw_y = h.amise(y, deriv.order = 0, kernel = "biweight")$h

# get ranges in which you want to estimate density
xrange = c(min(my.data$longitude), max(my.data$longitude))
yrange = c(min(my.data$latitude),  max(my.data$latitude))

# get 2d density estimate with quartic (biweight) kernel
result = kde_biweight_kernel(x,y, bw_x, bw_y, xrange, yrange)

Note that the bandwidth is especially computed for the biweight kernel case. The resulting density object is a bit different from the ks::kde object. It does, for example, not have contour levels yet. We can get the contour levels by computing the quantiles with a slightly modified version of the kde2dQuantile function from the rmngbpackage

# get quantiles of interest:
kde2dQuantile <- function(d, X, Y, probs = .05) {
  xInd <- sapply(X, function(x) which.closest(d$x, x))
  yInd <- sapply(Y, function(x) which.closest(d$y, x))
  zValues <- d$z[cbind(xInd, yInd)]
  quantile(zValues, probs=probs)
}
# get quantiles
quantiles = kde2dQuantile(result, x, y, seq(0,1,by=0.001))

From your question I am not sure which quantile you are interested in so I just chose the 1% quantile. To be able to plot the data in the same way as in the question, we have to format the density result in the same way as objects from the kde class:

# to make the kde estimate compatible with the other density estimates
# from the ks package, the result can be converted to a named list.
# -> create ks::KDE object:
axes = matrix(c(result$x,result$y), ncol = 2)
colnames(axes) = c('longitude', 'latitude')

my.fhat_biweight = list('x' = axes,
                        'eval.points' = list(result$x, result$y),
                        'estimate' = result['z']$z,
                        'gridtype' = 'linear', 'gridded' = TRUE,
                        'binned' = TRUE, 'names' = c("longitude","latitude" ))

# add quantile to ks::KDE object
my.fhat_biweight$cont = quantiles

# change class (make sure ks package is loaded for this)
class(my.fhat_biweight) <- "kde"

Finally plot the biweight kernel density over the data

plot(my.data$longitude, my.data$latitude)
plot(my.fhat_biweight, lwd = 3, display = "filled.contour", cont = cont=c(96.5), add = TRUE)

this outputs:

final plot

yuki
  • 745
  • 4
  • 15
  • I am a little troubled by how much this extends over the land of Oahu. The Gaussian kernel contour seems to follow the coastline rather closely. I am not sure I see where you used the 1% quantile but I need to study this more. I am not sure, but I was thinking the other analyst was drawing his contours based on absolute values of density (0.5 or 1) but I could be wrong about that. Regardless, thank you for all of your work on this. – Mark Miller Jun 17 '21 at 12:33
  • Yes you are absolutely right, it was the wrong link and I updated it. The expression for the product kernel is under 6.3.2 Multivariate Density Estimation. – yuki Jun 17 '21 at 16:46
  • I was also not quite sure about the confidence level and used the same one as you provided in your question. There are also more sophisticated variants to extend the univariate biweight kernel to the bivariate case - the product kernel is really the easiest way. You can find the 1% quantile value in the quantiles vector by calling ```my.fhat_biweight$cont```. But from how I understood the plot + "display.quantile" function the ```cont``` argument is a vector of quantile levels so you actually passed the 96.5% quantile? Correct me if I'm wrong! – yuki Jun 17 '21 at 16:53
  • Thank you, yuki. Which line in your code selects the 1% quantile? I thought this line generates all quantiles: `quantiles = kde2dQuantile(result, x, y, seq(0,1,by=0.001))`. I might misunderstand. I used the 96.5% contour but suspect the original analyst did not. I suspect he used a contour based on grid cells with a density value > 0.5 (or > 1). I might be wrong. Is it possible to generate a polygon based on the density value within grid cells? Should the 2nd `plot` statement in your code be: `plot(my.fhat_biweight, lwd = 3, display = "filled.contour", cont = c(96.5), add = TRUE)`? – Mark Miller Jun 17 '21 at 19:48
  • I actually didn't select the 1% quantile. You could get the 1% quantile from the quantile list with ```quantiles["1.0%"]```and then pass it to the ```cont```argument in the plot. But are you sure you want the 1% quantile? Maybe you could post the literature link to the question so that the "“> 1 per km density” description maybe becomes a bit more clear? – yuki Jun 18 '21 at 12:20
  • Thank you, yuki. I did not think you selected the 1% quantile. I got confused by this statement in your post: `From your question I am not sure which quantile you are interested in so I just chose the 1% quantile.` I do not have a literature link for `> 1 per km density`. It is not real clear to me. QGIS seems to allow it. I posted a detailed example of how to do it with QGIS and the same data set you have: [link]https://gis.stackexchange.com/questions/400853/heatmap-with-contour-polygons[link], but QGIS creates a huge number of tiny polygons instead of the one large polygon I want. – Mark Miller Jun 18 '21 at 13:24
  • I am not certain the QGIS example at the link in my previous comment does create polygons based on `> 0.5 per km density`. I suspected that QGIS was doing that. But I am not sure. The phrases `> 1 per km density` and `> 0.5 per km density` came from the original analyst and have never been clear to me. He cited Silverman (1986, p. 76, equation 4.5) “Density Estimation for Statistics and Data Analysis”. That is the only reference he provided. – Mark Miller Jun 18 '21 at 13:42
  • I checked the book and in that equation they only give the expression for the biweight kernel. It is actually a bit different than the equation I used. It is ```(3/pi)*((1-u^2)^2)``` instead of 15/16*… you could change this so that the kernel is exactly the one as implemented in Qgis. – yuki Jun 20 '21 at 14:56
  • Could it be that the author is evaluating the density on a grid of kilometers? And then draws the measures at which points the density is >1 or >0.5 ? So they’d mean density per square km. This would be an interpretation of “>1 per km density” that would make sense to me. Maybe check whether this would make sense when you change the biweight kernel to the specification I just mentioned in the last comment and see whether you can obtain a useful polygon from that? It would be a bit weird though since just measuring how large the density is would not correspond to a specific confidence level. – yuki Jun 20 '21 at 15:04
  • Thank you. I think you are correct in your last comment about how he created his polygon. But I have no idea how to do that. QGIS creates a huge number of tiny polygons. Presumably QGIS is just connecting neighboring grid cells with a value >1 (or > 0.5). I have no idea how to transform all of those tiny QGIS polygons into one large polygon based on the estimate of density per grid cell you described. Could you help me with that? Thank you, regardless. – Mark Miller Jun 22 '21 at 04:44