How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

Question

Is there any way to calculate the area of this ellipse when type = "norm"?

Default is type = "t". type = "norm" displays a different ellipse because it assumes a multivariate normal distribution instead of multivariate t-distribution

Here is the code and the plot (using similar code as other post):

library(ggplot2)
set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(type = "norm")

Previous answer was:

#Plot object
p = ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.

#Get ellipse coordinates from plot

pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]

# Center of ellipse

ctr = MASS::cov.trob(el)$center 
# I tried changing this to 'stats::cov.wt' instead of 'MASS::cov.trob' 
#from what is saw from (https://github.com/tidyverse/ggplot2/blob/master/R/stat-ellipse.R#L98)

# Calculate distance to center from each point on the ellipse

dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
These are, respectively, the largest and smallest values of dist2center. 

pi*min(dist2center)*max(dist2center)

Changing to stats::cov.wt wasn't enough to get the area of the "norm" ellipse (value calculated was the same). Any ideas on how to change the code?

Thanks!

Welcome to SO. Nice first question. small tip You can produce your data frame easier using `data.frame(x = rnorm (1:1000), y =rnorm (1:1000))` — tjebo, Feb 26 '20 at 23:49
Does this answer your question? [How to calculate the area of ellipse drawn by ggplot2?](https://stackoverflow.com/questions/38782051/how-to-calculate-the-area-of-ellipse-drawn-by-ggplot2) — Daniel Muñoz, Feb 27 '20 at 00:07
@DanielMuñoz that SO post is linked in my question, it is what I am asking about — Olivia Jean, Feb 28 '20 at 22:28
@OliviaJean aaa I understand you friend, sorry is that I had not understood you well — Daniel Muñoz, Feb 28 '20 at 22:30

tjebo · Answer 1 · 2020-02-27T14:57:13.180

Nice question, I learned something. But I cannot reproduce your problem and get (of course) different values with the different approaches.

I think the approach in the linked answer is not quite correct because the ellipse center is not calculated with the data, but based on the ellipse coordinates. I have updated to calculate this based on the data.

library(ggplot2)

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

p_norm <- ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  stat_ellipse(type = "norm")

pb <- ggplot_build(p_norm)
el <- pb$data[[2]][c("x", "y")]
ctr <- MASS::cov.trob(data)$center #updated previous answer here
dist2center <- sqrt(rowSums((t(t(el) - ctr))^2))
pi * min(dist2center) * max(dist2center)
#> [1] 18.40872

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

update thanks to Axeman for the thoughts.

The area can be directly calculated from the covariance matrix by calculating the eigenvalues first. You need to scale the variances / eigenvalues by the factor of confidence that you want to get. This blog helped me a lot to understand this a bit better

set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

cov_dat <- cov(dat) # covariance matrix

eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix

vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse

pi * vec[1] * vec[2]  
#> [1] 18.38858

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814

^{Created on 2020-02-27 by the reprex package (v0.3.0)}

The factor 5.991 represents the Chi-square likelihood for the 95% confidence of the data. For more information, see this thread

I think you should be able to get the area from just the covariance matrix given by `cov.wt`, although not sure how. But, you can definately avoid all the ggplot stuff by using `car::ellipse` directly on the covariance matrix. — Axeman, Feb 27 '20 at 00:07
@Axeman Here we go. Ignore my last comment. See my updated answer for calculation based on the covariance matrix. I think this would be the way to go. In this particular case one can simly use the variable variance — tjebo, Feb 27 '20 at 14:42
@Axeman It represents the chi-square likelihood - this is based on a few assumptions on the data, here mainly normal distribution. I found the thread linked in the answer very helpful — tjebo, Feb 27 '20 at 14:52

score 0 · Answer 2 · answered Feb 27 '20 at 15:09

Here is how to get a and b (then the area is pi*a*b) without using the data generated by stat_ellipse.

library(ggplot2)
gg <- ggplot(faithful, aes(eruptions, waiting)) +
  geom_point() +
  stat_ellipse(type = "norm", segments = 2000)

Sigma <- cov(faithful) 
evalues <- eigen(Sigma, symmetric = TRUE, only.values = TRUE)$values
p <- 0.95
r <- 2 * qf(p, 2, nrow(faithful)-1)
( a <- sqrt(r * evalues[1]) )
# 33.55752
( b <- sqrt(r * evalues[2]) )
# 1.216351

### check
ggb <- ggplot_build(gg)
el <- ggb$data[[2]][c("x","y")]
center <- colMeans(faithful)
dist2center <- sqrt(rowSums((t(t(el)-center))^2))
max(dist2center)
# 33.55751
min(dist2center)
# 1.216396

score 0 · Answer 3 · answered Mar 02 '20 at 18:06

Here is the solution I came up with using the code from the ellipse package using the quakes dataset. Its a lot longer but simpler to understand how it works (at least for me). I believe the area solution is in meters squared. Any thoughts on this approach?

#Both plots together
Bothgg <- ggplot(quakes, aes(long, lat)) +
  geom_point() +
  stat_ellipse(type = "t")+    #type = "t" is unnecessary because it is the default, but I put it here for clarity
  stat_ellipse(type = "norm", linetype = 2)
Bothgg



###From ellipses code
dfn <- 2
dfd <- nrow(quakes) - 1
segments = 51
level = .95

#Area for solid line
  v <- MASS::cov.trob(quakes[ ,c(1,2)])
  shape <- v$cov
  center <- v$center
  chol_decomp <- chol(shape)
  radius <- sqrt(dfn * stats::qf(level, dfn, dfd))
  angles <- (0:segments) * 2 * pi/segments
  unit.circle <- cbind(cos(angles), sin(angles))
  ellipse <- as.data.frame(t(center + radius * t(unit.circle %*% chol_decomp)))
  centerd = as.data.frame(center)
  ellipse$centerLat = centerd[1,1]
  ellipse$centerLong = centerd[2,1]
  ellipse$distance = distm(ellipse[,c('long','lat')], ellipse[,c('centerLong','centerLat')], fun=distVincentyEllipsoid)
pi*(min(ellipse$distance)/2)*(max(ellipse$distance)/2)


#Area for dashed line
  v <- stats::cov.wt(quakes[ ,c(1,2)])
  shape <- v$cov
  center <- v$center
  chol_decomp <- chol(shape)
  radius <- sqrt(dfn * stats::qf(level, dfn, dfd))
  angles <- (0:segments) * 2 * pi/segments
  unit.circle <- cbind(cos(angles), sin(angles))
  ellipse <- as.data.frame(t(center + radius * t(unit.circle %*% chol_decomp)))
  centerd = as.data.frame(center)
  ellipse$centerLat = centerd[1,1]
  ellipse$centerLong = centerd[2,1]
  ellipse$distance = distm(ellipse[,c('long','lat')], ellipse[,c('centerLong','centerLat')], fun=distVincentyEllipsoid)
pi*(min(ellipse$distance)/2)*(max(ellipse$distance)/2)

How to calculate the area of a ggplot stat_ellipse() when 'type = "norm"?

3 Answers3

Linked