3

The bfast() function in package bfast is supposed to be able to detect both breakpoints in long-term trends and changes in the seasonal component. One example is this graph (source):
enter image description here
In this graph, subplot no. 2 shows a detected change in seasonality, while no. 3 shows a breakpoint in the trend.

However, I don't understand how to tell bfast() to look for changes/breakpoints in seasonality. All I get is breakpoints in the long-term trend. Here is a reproducible example, simulating a 50-year time series with weekly measurements of the seasonal variable y (i.e., 52 measurements per year):

n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)  

These data display a constant seasonal trend in the data, with an annual peak around week 13. Now, let us introduce a shift in seasonality in year 25, shifting the seasonal cycle 8 weeks later for the years 26-59:

move_data <- function(data, year, weeks_to_move){
  x <- data[data$Year == year, "y"]
  c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
  mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}

The variable y_shifted now has the annual peak around week 13 in years 1-25 and and around week 21 in years 26-52. Let us plot it, compared to the 'unshifted' variable y:

mydata$Phase <- ifelse(mydata$Year <= 25, "Year 1-25", "Year 26-50")
mydata %>%
  tidyr::gather("y_variable", "value", y, y_shifted) %>%
  ggplot(aes(Week, value, group = Year, color = Phase)) + geom_line() +
  facet_grid(.~y_variable)

[Annual cycle of ]y and y_shifted[3]

This abrupt shift in seasonality should be easy to detect. However, when I run `bfast(), it doesn't detect any change:

y_ts <- ts(mydata$y_shifted, start = c(1,1), frequency = freq)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

enter image description here

As you can see, no change is detected in the seasonality (subplot 2 above). The residuals (subplot 4) picks up the change in seasonality, which is clear if we plot residuals by day-of-the-year:

mydata$Residuals <- fit$output[[1]]$Nt
ggplot(mydata, aes(Week, Residuals, group = Year, color = Phase)) + geom_point()

Residuals vs day-of-the-year, marked by year 1-25 and 26-50

I have a feeling that there is some parameter or option I need to change in order to make bfast() look for changes in seasonality, but which? I haven't been able to dig out this info from the documentation.

Dag Hjermann
  • 1,960
  • 14
  • 18

2 Answers2

5

I got the same problem when testing bfast on my consumer portfolio data and failed to find any real solution. I went on to dig into the bfast literature from the earth sensing community, which is where bfast was first developed and extensively used. My read is that there was really little you can do to get bfast to always fit a useful seasonal component.

Days ago, I encountered this Quora discussion on the best software for time series analysis and found there is a new R package Rbeast for breakpoint detection and time series decomposition. There is also a nice tweet that shows a quick comparison between bfast and Rbeast.

After some experimenting, I found Rbeast was able to pinpoint seasonal breakpoints in my data and yours. To be frank, I still have no idea how Rbeast works. The BEAST algorithm in Rbeast seems rather complicated, with tons of outputs; it is not well documented and not as easy to use as bfast. Let me show what I got, first using your data and then using a second artificial time series.

Your data

# The original code to generate your data
n_years <- 50
freq    <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y         <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata    <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y) 

move_data <- function(data, year, weeks_to_move){
  x <- data[data$Year == year, "y"]
  c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}

mydata$y_shifted <- mydata$y
for (year in 26:50){
  mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}
 
# You data analyzed by the BEAST algorithm in Rbeast
library(Rbeast) 
fit <- beast(mydata$y_shifted, freq=52)
print(fit)
plot(fit)
#####################################################################
#                      Seasonal  Changepoints                       #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|*                                               |
|Pr(ncp = 1 )=0.999|*********************************************** |
|Pr(ncp = 2 )=0.001|*                                               |
|Pr(ncp = 3 )=0.000|*                                               |
|Pr(ncp = 4 )=0.000|*                                               |
|Pr(ncp = 5 )=0.000|*                                               |
|Pr(ncp = 6 )=0.000|*                                               |
|Pr(ncp = 7 )=0.000|*                                               |
|Pr(ncp = 8 )=0.000|*                                               |
|Pr(ncp = 9 )=0.000|*                                               |
|Pr(ncp = 10)=0.000|*                                               |
.-------------------------------------------------------------------.
|    Summary for number of Seasonal ChangePoints (scp)              |
.-------------------------------------------------------------------.
|ncp_max    = 10   | MaxSeasonKnotNum: A parameter you set          |
|ncp_mode   = 1    | Pr(ncp= 1)=1.00: There is a 99.9% probability  |
|                  | that the seasonal component has  1 chgnpt(s).  |
|ncp_mean   = 1.00 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10            |
|ncp_pct10  = 1.00 | 10% percentile for number of changepoints      |
|ncp_median = 1.00 | 50% percentile: Median number of changepoints  |
|ncp_pct90  = 1.00 | 90% percentile for number of changepoints      |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of   |
| occurrence: Please combine the ncp reported above to determine    |
| which changepoints below are  practically meaningful              |
'-------------------------------------------------------------------'
|scp#              |time (cp)                  |prob(cpPr)          |
|------------------|---------------------------|--------------------|
|1                 |1301.000000                |1.00000             |
.-------------------------------------------------------------------.

enter image description here

The abrupt seasonal shift was detected precisely. Rbeast also gives the probability of detecting breakpoints in seasonality and trend (the red ad green curves in the Pr(scp) and Pr(tcp) panels of the above figure). The probability for the detected seasonal shift is very high, close to 1.0. The trend of your data is a flat line. It is essentially a constant of zero and the probability of finding breakpoints (i.e., changepoints as used in Rbeast) in trend is also close to zero all over.

A second time series

A cool feature of Rbeast is the estimation of sin and cos orders for the harmonic seasonal model. Below, I generated a time series that has three seasonal segments (i.e., two breaks) plus a sloped trend with no breaks. The three seasonal segments have different sin orders, taking 1, 2, and 3, respectively.

# Generate a sample time series with three seasonal segments
# the sin/cos orders for the three segs are different.
seg1 <- 1:1000
seg2 <- 1001:2000
seg3 <- 2001:3000
new_data <- c( sin(seg1*2*pi/52), 0.6*sin( seg2*2*pi/52*2), 0.3*sin( seg3*2*pi/52*3)) + (1:3000)*0.0002+ rnorm(3000, sd = 0.1)
# Test bfast using new_data
y_ts <- ts(new_data, start = c(1,1), frequency = 52)
fit  <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)

Surprisingly enough, bfast didn't detect any breaks in seasonality, although the three segments are easily eye-balled in the plotted data Yt.

# Analyze the new_data time series using `Rbeast`

fit <- beast(new_data, freq=52)
print(fit)
plot(fit)
#####################################################################
#                      Seasonal  Changepoints                       #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|*                                               |
|Pr(ncp = 1 )=0.000|*                                               |
|Pr(ncp = 2 )=0.969|*********************************************** |
|Pr(ncp = 3 )=0.031|**                                              |
|Pr(ncp = 4 )=0.000|*                                               |
|Pr(ncp = 5 )=0.000|*                                               |
|Pr(ncp = 6 )=0.000|*                                               |
|Pr(ncp = 7 )=0.000|*                                               |
|Pr(ncp = 8 )=0.000|*                                               |
|Pr(ncp = 9 )=0.000|*                                               |
|Pr(ncp = 10)=0.000|*                                               |
.-------------------------------------------------------------------.
|    Summary for number of Seasonal ChangePoints (scp)              |
.-------------------------------------------------------------------.
|ncp_max    = 10   | MaxSeasonKnotNum: A parameter you set          |
|ncp_mode   = 2    | Pr(ncp= 2)=0.97: There is a 96.9% probability  |
|                  | that the seasonal component has  2 chgnpt(s).  |
|ncp_mean   = 2.03 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10            |
|ncp_pct10  = 2.00 | 10% percentile for number of changepoints      |
|ncp_median = 2.00 | 50% percentile: Median number of changepoints  |
|ncp_pct90  = 2.00 | 90% percentile for number of changepoints      |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of   |
| occurrence: Please combine the ncp reported above to determine    |
| which changepoints below are  practically meaningful              |
'-------------------------------------------------------------------'
|scp#              |time (cp)                  |prob(cpPr)          |
|------------------|---------------------------|--------------------|
|1                 |2001.000000                |1.00000             |
|2                 |1001.000000                |1.00000             |
|3                 |1027.000000                |0.02942             |
.-------------------------------------------------------------------.

enter image description here

The above is the Rbeast result. The two breaks and the three seasonal segments are recovered. There are no breaks in the trend Rbeast estimated seasonal harmonic orders. In the Order_s panel above, the three sin and cos orders are recovered correctly. The Order_s curve also shows the locations of the two seasonal breaks.

  • Benhmad: As the author of Rbeast, I appreciate your testing it out with detailed examples here as well as your criticisms about the usage and documentation of Rbeast. I made a new version of it and now available at https://github.com/zhaokg/Rbeast or `cran.r-project.org/web/packages/Rbeast/index.html`. The API is slightly different from the old version, hopefully making it a little bit easier. The API used in your post here become deprecated. I also provide more examples in the help document (i.e., ?Rbeast::beast) – zhaokg Apr 08 '22 at 05:11
  • @zhaokg, thank you for the update. I edited my answer slightly per the latest package fetched from your github site. – François Benhmad Apr 11 '22 at 05:26
1

Besides trying a different toolkit, bfast has several parameters. The most important one is the h parameter. Oftentimes, I found the results may change a lot when varying the value of this parameter.

sshwang
  • 11
  • 3