Find the biggest drops/rises in a time series without a loop (preferably using tidy/dplyr)?

Question

I have many time series and want to find a way to identify the top 10 greatest rises and falls for each time series.

This is not as easy as it sounds because the most prominent features on a time series can sometimes be interrupted by movements in the opposite direction, if only for a very short time (e.g. one period). This means any algorithm that simply scans for the most consecutive periods movement in the same direction usually fails to find the most prominent features (e.g. that a human would identify).

Are there any standard methods that can be used 'out of the box'?

For example, in the following chart, if asked to identify the most prominent falls, a human would probably point to the circled areas. How can we get code to identify these falls (as a human would)?

Note: I guess a convolutional neural network could probably do this, but I'm after simpler solutions if possible (it doesn't have to be perfect)

library(tidyverse)
library(priceR)
au <- historical_exchange_rates("AUD", to = "USD",
                          start_date = "2010-01-01", end_date = "2020-06-30")
au %>% 
  tail(365 * 8) %>% 
  rename(aud_to_usd = one_AUD_equivalent_to_x_USD) %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot(aes(x = date, y = aud_to_usd, group = 1)) +
  geom_line() +
  geom_smooth(method = 'loess', se = TRUE) + 
  theme(axis.title.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  scale_x_date(date_labels = "%Y", date_breaks = "1 year")  +
  ggtitle("AUD to USD over last 8 years")

teunbrand · Accepted Answer · 2020-08-03T10:10:50.327

Here is a function that you could use. It makes use of the run-length encoding of the timeseries into segments that rise or fall. It allows you to set a gap_width argument that instructs how long interruptions of stretches are allowed to be. It is in base R, it is not perfect, but seems to work decently for the case you presented above.

rise_and_falls <- function(value, time, gap_width = 5, top = 10, type = "fall") {
  type <- match.arg(type, c("fall", "rise"))
  if (type == "fall") {
    rle <- rle(sign(diff(value)) == -1)
  } else {
    rle <- rle(sign(diff(value)) == 1)
  }
  rle$values <- !rle$values & rle$lengths <= gap_width | rle$values
  rle <- rle(inverse.rle(rle)) # Clean up changed runs
  df <- data.frame(
    start = cumsum(rle$lengths) - rle$lengths + 1,
    end = cumsum(rle$lengths),
    len = rle$lengths,
    drop = rle$values
  )
  df <- transform(
    df,
    start_value = value[start],
    end_value = value[end],
    start_time = time[start],
    end_time = time[end]
  )
  df$diff <- df$start_value - df$end_value
  df <- df[order(df$diff),]
  if (type == "fall") {
    tail(df, top)
  } else {
    head(df, top)
  }
}

I recommend you use it as follows:

au %>% 
  tail(365 * 8) %>% 
  rename(aud_to_usd = one_AUD_equivalent_to_x_USD) %>% 
  mutate(date = as.Date(date)) -> au

df <- rise_and_falls(au$aud_to_usd, au$date, type = "fall")

ggplot(au, aes(x = date, y = aud_to_usd, group = 1)) +
  geom_line() +
  geom_smooth(method = 'loess', se = TRUE) + 
  theme(axis.title.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  scale_x_date(date_labels = "%Y", date_breaks = "1 year")  +
  ggtitle("AUD to USD over last 8 years") +
  geom_segment(data = df, aes(x = start_time, y = start_value,
                              xend = end_time, yend = end_value),
               size = 2, colour = "red")

If somebody wants to improve this, it probably makes sense to cut-off the stretches at the local extrema.

Another option would be to smooth the line with a Gaussian kernel first and then run the rise_and_falls() function with gap_width = 0.

Find the biggest drops/rises in a time series without a loop (preferably using tidy/dplyr)?

1 Answers1