3

I was wondering how to calculate the intersection between two ellipses e.g. the volume of the intersection between versicolor and virginca as illustrated in this graph: PCA on iris dataset which is plotted using the following mwe based on this tutorial:

data(iris)
log.ir <- log(iris[, 1:4])
ir.species <- iris[, 5]
ir.pca <- prcomp(log.ir, center = TRUE, scale. = TRUE)

library(ggbiplot)
g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1, 
          groups = ir.species, ellipse = TRUE,
          circle = TRUE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal', 
           legend.position = 'top')
print(g)

I get the covariances and centres for the ellipses as follows:

setosa.cov <- cov(ir.pca$x[ir.species=="setosa",])
versicolor.cov <- cov(ir.pca$x[ir.species=="versicolor",])
virginica.cov <- cov(ir.pca$x[ir.species=="virginica",])
setosa.centre <- colMeans(ir.pca$x[ir.species=="setosa",])
versicolor.centre <- colMeans(ir.pca$x[ir.species=="versicolor",])
virginica.centre <- colMeans(ir.pca$x[ir.species=="virginica",])

But then I am at my wit's end :-|

Edit: Following the indications of @carl-witthoft below, here an example using siar::overlap:

library(siar)
setosa <- ir.pca$x[ir.species=="setosa",]
versicolor <- ir.pca$x[ir.species=="versicolor",]
virginica <- ir.pca$x[ir.species=="virginica",]

overlap.fun <- function(data.1, data.2){
   dimensions <- ncol(data.1)
   for(i in 1:(dimensions-1)){
      overlap.out <- overlap(data.1[,i], data.1[,i+1], data.2[,i], data.2[,i+1], steps = 5)
      out$overlap[i] <- overlap.out$overlap
      out$area1[i] <- overlap.out$area1
      out$area2[i] <- overlap.out$area2
   }
   return(out)
}

overlap.fun(versicolor, virginica)

returns:

$overlap
[1] 0.01587977 0.48477088 0.08375927
$area1
[1]1.020596 1.04614461 0.08758691                 
$area2
[1] 1.028594 1.1535106 0.1208483

strangely enough when I do a percentage calculation the values do not really correspond to the ellipsoids in the ggbiplot PCA:

tmp <- overlap(versicolor[,1], versicolor[,2], virginica[,1], virginica[,2], steps = 5)
virginica.percentage <- round(x=(tmp$overlap/tmp$area2*100), digits = 2)
versicolor.percentage <- round(x=(tmp$overlap/tmp$area1*100), digits = 2)
> virginica.percentage [1] 1.54
> versicolor.percentage[1] 1.56

which is much less than indicated in the Figure 1 above. But might better open another thread on this here.

Community
  • 1
  • 1
raumkundschafter
  • 429
  • 1
  • 8
  • 24
  • The fundamental method is to find the intersection points, calculate the integrals of the "upper" and "lower" curves, and take the distance. You need to split this up to ensure each integral is over a single-valued function range. That said, I seem to recall there's a package or two on CRAN which include this sort of intersection-area calculation. Naturally I can't recall which ones :-( – Carl Witthoft May 17 '16 at 12:41

1 Answers1

4

Possible tools:

 spatstat::overlap.owin , geo::geointersect, siar::overlap .

You may ask -- and you should ask -- "How did he get those answers so fast?

Get thee the package sos and type ???overlap

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • Thank you very much indeed for the light-speed answer! I tried the siar::overlap as it's the most understandable to me. `library(siar)` `versicolor <- ir.pca$x[ir.species=="versicolor",]` `virginica <- ir.pca$x[ir.species=="virginica",]` `overlap(versicolor[,1], versicolor[,2], virginica[,1], virginica[,2], steps = 5)` returns: '$overlap' `[1] 0.01587977` `$area1` `[1] 1.020596` `$area2` `[1] 1.028594` – raumkundschafter May 17 '16 at 15:11
  • @sesselastronaut . Glad that works - if it fully meets your needs, please mark the answer as "accepted." – Carl Witthoft May 17 '16 at 15:14
  • @carl-wittcroft it definitely does for the first two PCs as shown by the example. I was wondering how to do it over all the four principal components originating from the pca on the iris dataset? Compare them all and add them up? – raumkundschafter May 17 '16 at 15:41
  • The `spatstat::overlap.owin` seem to only accept ["Windows (objects of class "owin") or data acceptable to as.owin."](http://www.inside-r.org/packages/cran/spatstat/docs/union.owin). How to transpose the ellipses into such an object might be a topic for another thread though. `geo::geointersect` similarly works with polygons. – raumkundschafter May 17 '16 at 15:52