4

For a large data set of species I am trying to calculate the circular mean of a given set of months, e.g. for a species that flowers from March-July, I want to know the mean month of flowering (i.e. May), and the variance around the mean as well.

Given months are circular, such that the mean of a species that flowers December to February should be January, I am using circular statistics to calculate circular means, in particular the R package circular. However, when I try to calculate the circular mean using the circular package and units = "hours" I get values that are obviously wrong, and look more like the linear mean.

Here's a simplified example:

library(circular)  #to install: install.packages("circular")

#generate example data, from Nov (i.e. 11) to March (i.e. 3)
df <- data.frame(X = c(rep(paste("species", 2), 5)),
                 Y = c(1:3, 11:12))

df$Y <- circular::circular(df$Y, units = "hours", template = "clock12") #convert to circular variable

circular::mean.circular(df$Y) #calculate circular mean

#should return mean of 1 (January) but instead returns:

#Circular Data: 
#Type = angles 
#Units = hours 
#Template = clock12 
#Modulo = asis 
#Zero = 1.570796 
#Rotation = clock 
#[1] 4.774558

When I convert my month values to radians as per this post I get the correct mean for my simplified example:

df$Yrad <- ((df$Y-1)*(2*pi/12)) #convert months to radians with 0 radians = January

circmean <- circular::mean.circular(circular(df2$Yrad, units = "radians"))

circmean <- ((circmean + 12) %% 12) + 1 #convert mean from radians back to months

#gives mean of 1 i.e. January!

But I don't get the correct means using this method for more complicated example data - this gives a circular mean of 4.14 (i.e. April-May) for a species that flowers from May (5) to September (9), when I would expect the circular mean of this to be 7 (July):

library(tidyverse) #to install: install.packages("tidyverse")
library(circular)  #to install: install.packages("circular")

#generate example data
df2 <- data.frame(X = c(rep(paste("species", 1), 5), rep(paste("species", 2), 5), 
                       rep(paste("species", 3), 4), rep(paste("species", 4), 6)),
                 Y = c(5:9, 1:3, 11:12, 1:2, 11:12, 3, 5, 8, 9, 10, 12))

df2$Yrad <- ((df2$Y-1)*(2*pi/12)) #convert months to radians with 0 radians = January
df2$Yrad <- circular::circular(df2$Yrad, units = "radians") #convert Yrad to circular variable

#calculate circular mean for each species in column X
circmean <- df %>%
  dplyr::group_by(X) %>%
  dplyr::summarise(circ_mean = Yrad %>%
                     circular::mean.circular()) %>%
  ungroup()

circmean$circmeanmonth <- ((circmean$circ_mean + 12) %% 12) + 1 #convert mean from radians back to months

#returns below - circular mean looks correct for species 2 and maybe species 3?
#>X           circ_mean       circmeanmonth
#>species 1   3.141593e+00    4.141593
#>species 2   -2.379867e-16   1.000000
#>species 3   -2.617994e-01   12.738201
#>species 4   -1.986080e+00   11.013920

If I don't convert to radians and use units = "hours" circular seems to correctly calculate the circular mean for species 1 from the above example only. If I do convert to radians and use units = "radians" I get the correct answer for species 2 and maybe species 3. Species 4 is a more complicated case which is common in my data. How do I get the correct circular mean in all cases? Should I try a different package, or have I misunderstood something about the calculation of circular means?

saltbush
  • 43
  • 4

2 Answers2

1

As far as I can tell "clock12" doesn't actually compute on a 12-hour clock, i.e. it doesn't wrap from 12 to 0 (even though the display does). mean(2*df$Y) does work as expected ... Note that ?circular says

template: how the data should be plotted

(i.e., not how it should be processed). So I don't think (unfortunately) that you can actually use "clock12" as a substitute for months (i.e., circular data with a period of 12).

It would be a nice project to hack/update/create a "months" template/type for the package ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks Ben! Good to know. For me `mean(2*df$Y)` gives a mean of 2, not 1 as I would have expected? Also I've edited my question and added more example data as I realised that converting to radians doesn't seem to calculate the correct circular mean in all cases, or at least not as I would expect it from simplified data. – saltbush Mar 23 '21 at 03:36
  • Yes, it gives a mean of 2. My point was that multiplying by 2 maps the data to the scale that `circular` is expecting (0-24); you can then convert back. It's a hack, but `mean(df$Y*2)/2` should give correct answers (just make sure to document this hack if you use it!) I'll check your other example tomorrow, if I have time ... – Ben Bolker Mar 23 '21 at 04:10
  • Thanks Ben! I've tried this on my more complex example and it seems to work great, as long as I then convert the means back to modulo 12 with `%% 12`. Now to try it on my enormous data set! – saltbush Mar 23 '21 at 05:01
1

Should it be this?

#calculate circular mean for each species in column X
circmean <- df2 %>%
  dplyr::group_by(X) %>%
  dplyr::summarise(circ_mean = mean(Yrad), n = n())

#convert mean from radians back to months (have to round to avoid rounding errors with %%)
circmean$circmeanmonth <- ((round( circmean$circ_mean * 12 / (2*pi), digits = 3) %% 12)) + 1

It produces:

# A tibble: 4 x 4
  X         circ_mean         n circmeanmonth
  <fct>     <circular>    <int> <circular>   
1 species 1  3.141593e+00     5  7.000       
2 species 2 -2.379867e-16     5  1.000       
3 species 3 -2.617994e-01     4 12.500       
4 species 4 -1.986080e+00     6  9.207
Gregory McIntyre
  • 2,051
  • 1
  • 12
  • 7
  • Thanks Greg! This also works, though I think if I can avoid converting to radians I'd prefer to as with Ben's workaround above – saltbush Mar 23 '21 at 05:10