1

I am trying to cluster time series data from a longitudinal survey that includes participants’ yearly reported incomes. These income trajectories vary in length, so dynamic time warping seems like an appropriate tool for calculating a distance matrix.

Some experimenting revealed that how these trajectories are mapped to each other varies significantly depending on the assigned step pattern. I therefore would like to pick the most appropriate one for my dataset. I’m not very experienced with dynamic time warping, so I decided to try clustering on a small sample using distance matrices created by a range of step patterns and see which one had the best performance metrics.

To do this, I used the dtw package’s rabinerJuangStepPattern function, which can implement a “comprehensive set of step-patterns” as outlined in Rabiner & Juang (1993; I wasn't able to get a copy of this document). I therefore created a nested for loop to iterate through all configurations of the Rabiner-Juang set, and found that many of them threw the following error:

Error in dtw(… :  No warping path exists that is allowed by costraints

I’ve replicated this problem using a toy version of my data, which only tries to calculate distances relative to the first participant in the dataset:

# Import required libraries
library(tidyverse)
library(dtw)

# Set seed for reproducible results
set.seed(123)
# Generate lengths of sample income trajectories
lengths  = sample(8:27,500,replace = T) %>% as.list()
# Use rnorm to generate income trajectories of varying lengths, as defined above
inc_traj = list() %>% .[1:500] %>% map2(lengths, ~ rnorm(.y, 1588.647, 1484.186))
# Create list which allows comparison of all trajectories with that of the first participant
pairs    = list() %>% .[1:500] %>% map2(as.list(1:500), ~ inc_traj %>% .[c(1,.y)])

# Empty list of distances to populate using for loop below
distances = list()
# Empty vector of step pattern names to populate using for loop below
name = c()
# Define loop counter
l = 0

# For each Rabiner-Juang family
for (a in 1:7) {
  # For each slope-weighting sub-type
  for (b in 1:4) {
    # For both smoothed and unsmoothed applications
    for (c in 1:2) {
      
      # Increase loop count by 1
      l = l + 1
      
      # Use try catch to treat errors
      tryCatch({
        # Calculate the distance between the first income trajectory and all other trajectories in the sample
        temp = pairs %>% map(~ dtw(.[[1]], .[[2]], keep = T, step = rabinerJuangStepPattern(a,letters[b],smoothed = {c == 1}))) %>% 
                          map(~ .$distance) %>% unlist()
        # Assign distances to distances list
        distances[[l]] = temp
      }, 
      # No additional commands for warnings
      warning=function(war) {}, 
      # No additional commands for errors
      error=function(err) {}, 
      # If code fails to run, assign NULL to distances list
      finally=function(f) {
        distances[[l]] = NULL
      })
      
      # Add name to name list
      name[l] = paste0("Rabiner-Juang:",a,",",letters[b],",smoothed=",{c == 1})
      
      # Print for loop progress
      cat("\r", paste0("Rabiner-Juang:",a,",",letters[b],",smoothed=",{c == 1},". ",l," of ",7*4*2," calculated."))
      
    }
  }
}

# Assign names to all list objects
distances = distances %>% setNames(name)

# Get names of Rabiner-Juang step patterns that worked correctly
distances %>% map(~ !is.null(.)) %>% unlist() %>% .[. == T] %>% names()

The output from this code indicates that step patterns from families I, V and VII work correctly, whilst those from families II, III, IV and VI produce the error.


My questions are therefore as follows:

1) Why do some of these families work, whilst others produce an error? Is this because certain families are inappropriate for this kind of data, or is my implementation wrong?

2) Does anyone know of any theoretical reasons why some step-patterns might be preferred over others in this use-case?


Many thanks in advance for your time!!!


Citations:

Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

Misc584
  • 357
  • 3
  • 16
  • 1
    I don't have a complete answer, but some cases don't work due to windowing, see Figure 5 in [the package's vignette](https://cran.r-project.org/web/packages/dtw/vignettes/dtw.pdf). Additionally, the [documentation for Rabiner-Juang's set](https://rdrr.io/rforge/dtw/man/stepPattern.html) specifically says "Not all combinations of arguments make sense", although it doesn't say why. – Alexis Feb 11 '22 at 22:10

0 Answers0