For a large data set of species I am trying to calculate the circular mean of a given set of months, e.g. for a species that flowers from March-July, I want to know the mean month of flowering (i.e. May), and the variance around the mean as well.
Given months are circular, such that the mean of a species that flowers December to February should be January, I am using circular statistics to calculate circular means, in particular the R package circular. However, when I try to calculate the circular mean using the circular package and units = "hours"
I get values that are obviously wrong, and look more like the linear mean.
Here's a simplified example:
library(circular) #to install: install.packages("circular")
#generate example data, from Nov (i.e. 11) to March (i.e. 3)
df <- data.frame(X = c(rep(paste("species", 2), 5)),
Y = c(1:3, 11:12))
df$Y <- circular::circular(df$Y, units = "hours", template = "clock12") #convert to circular variable
circular::mean.circular(df$Y) #calculate circular mean
#should return mean of 1 (January) but instead returns:
#Circular Data:
#Type = angles
#Units = hours
#Template = clock12
#Modulo = asis
#Zero = 1.570796
#Rotation = clock
#[1] 4.774558
When I convert my month values to radians as per this post I get the correct mean for my simplified example:
df$Yrad <- ((df$Y-1)*(2*pi/12)) #convert months to radians with 0 radians = January
circmean <- circular::mean.circular(circular(df2$Yrad, units = "radians"))
circmean <- ((circmean + 12) %% 12) + 1 #convert mean from radians back to months
#gives mean of 1 i.e. January!
But I don't get the correct means using this method for more complicated example data - this gives a circular mean of 4.14 (i.e. April-May) for a species that flowers from May (5) to September (9), when I would expect the circular mean of this to be 7 (July):
library(tidyverse) #to install: install.packages("tidyverse")
library(circular) #to install: install.packages("circular")
#generate example data
df2 <- data.frame(X = c(rep(paste("species", 1), 5), rep(paste("species", 2), 5),
rep(paste("species", 3), 4), rep(paste("species", 4), 6)),
Y = c(5:9, 1:3, 11:12, 1:2, 11:12, 3, 5, 8, 9, 10, 12))
df2$Yrad <- ((df2$Y-1)*(2*pi/12)) #convert months to radians with 0 radians = January
df2$Yrad <- circular::circular(df2$Yrad, units = "radians") #convert Yrad to circular variable
#calculate circular mean for each species in column X
circmean <- df %>%
dplyr::group_by(X) %>%
dplyr::summarise(circ_mean = Yrad %>%
circular::mean.circular()) %>%
ungroup()
circmean$circmeanmonth <- ((circmean$circ_mean + 12) %% 12) + 1 #convert mean from radians back to months
#returns below - circular mean looks correct for species 2 and maybe species 3?
#>X circ_mean circmeanmonth
#>species 1 3.141593e+00 4.141593
#>species 2 -2.379867e-16 1.000000
#>species 3 -2.617994e-01 12.738201
#>species 4 -1.986080e+00 11.013920
If I don't convert to radians and use units = "hours"
circular seems to correctly calculate the circular mean for species 1 from the above example only. If I do convert to radians and use units = "radians"
I get the correct answer for species 2 and maybe species 3. Species 4 is a more complicated case which is common in my data. How do I get the correct circular mean in all cases? Should I try a different package, or have I misunderstood something about the calculation of circular means?