0

I have two spatial datasets with coordinates indicating observations of a species and want to estimate the area of overlap among these datasets. Since point coordinates cannot represent an area, one has to bin the coordinates using similar x (longitude) and y (latitude) categories for both datasets.

For this task, I found the practical hexbin package, which does hexagonal binning. The package is great, but at least I fail to find a function that directly outputs the coordinates / IDs of overlapping bins among hexbin objects. For example, the hdiffplot returns a nice graphical overview of overlapping bins, but how to extract this information for further analysis?

library(hexbin)

set.seed(1); df1 <- data.frame(x = rnorm(10, 0, 5), y = rnorm(10, 0, 5))
set.seed(2); df2 <- data.frame(x = rnorm(10, 0, 5), y = rnorm(10, 0, 5))

xrange <- c(floor(min(c(df1$x, df2$x))-1), ceiling(max(c(df1$x, df2$x))+1))
#-/+1 just to make the plot nicer
yrange <- c(floor(min(c(df1$y, df2$y))-1), ceiling(max(c(df1$y, df2$y)))+1)

hb1 <- hexbin(df1$x, df1$y, xbins = 10, xbnds = xrange, ybnds = yrange)
hb2 <- hexbin(df2$x, df2$y, xbins = 10, xbnds = xrange, ybnds = yrange)

hdiffplot(hb1,hb2, xbnds = xrange, ybnds = yrange)

enter image description here

Mikko
  • 7,530
  • 8
  • 55
  • 92

1 Answers1

0

I figured out a solution to this problem while making the question. Will post it here in hopes that it will help someone one day.

You can extract the coordinates using the hcell2xy function. Here is a little function to find the unique and overlapping coordinates for bin centroids:

#' @title Print overlapping and unique bin centroid coordinates for two hexbin objects
#' @param bin1,bin2 two objects of class hexbin.
#' @details The hexbin objects for comparison, bin1 and bin2, must have the same plotting limits and cell size.
#' @return Returns a list of data frames with unique coordinates for \code{bin1} and \code{bin2} as well as overlapping coordinates among bins.

hdiffcoords <- function(bin1, bin2) {

  ## Checks modified from: https://github.com/edzer/hexbin/blob/master/R/hdiffplot.R
  if(is.null(bin1) | is.null(bin1)) {
    stop("Need 2 hex bin objects")
  } else {
        if(bin1@shape != bin2@shape)
            stop("Bin objects must have same shape parameter")
        if(all(bin1@xbnds == bin2@xbnds) & all(bin1@ybnds == bin2@ybnds))
            equal.bounds <- TRUE
        else stop("Bin objects need the same xbnds and ybnds")
        if(bin1@xbins != bin2@xbins)
            stop("Bin objects need the same number of bins")
  }

  ## Find overlapping and unique bins

  hd1 <- data.frame(hcell2xy(bin1), count_bin1 = bin1@count, cell_bin1 = bin1@cell)
  hd2 <- data.frame(hcell2xy(bin2), count_bin2 = bin2@count, cell_bin2 = bin2@cell)

  overlapping_hd1 <- apply(hd1, 1, function(r, A){ sum(A$x==r[1] & A$y==r[2]) }, hd2)
  overlapping_hd2 <- apply(hd2, 1, function(r, A){ sum(A$x==r[1] & A$y==r[2]) }, hd1)

  overlaps <- merge(hd1[as.logical(overlapping_hd1),], hd2[as.logical(overlapping_hd2),])

  unique_hd1 <- hd1[!as.logical(overlapping_hd1),]
  unique_hd2 <- hd2[!as.logical(overlapping_hd2),]

  ## Return list of data.frames

  list(unique_bin1 = unique_hd1, unique_bin2 = unique_hd2, overlapping = overlaps)

}

This information should be the same than returned by hdiffplot in graphical format:

df <- hdiffcoords(hb1, hb2)

library(ggplot2)

ggplot() + 
  geom_point(data = df$unique_bin1, aes(x = x, y = y), color = "red", size = 10) + 
  geom_point(data = df$unique_bin2, aes(x = x, y = y), color = "cyan", size = 10) +
  geom_point(data = df$overlapping, aes(x = x, y = y), color = "green", size = 10) + theme_bw() 

enter image description here

Any comments/corrections are appreciated.

Mikko
  • 7,530
  • 8
  • 55
  • 92