I am trying to use proxy::dist
function with a custom distance matrix but what I have now is very slow.
This is a reproducible example of how I call my custom function:
set.seed(1)
test <- matrix(runif(4200), 60, 70)
train <- matrix(runif(4200), 60, 70)
dMatrix <- proxy::dist(x = test, y = train, method = customDTW,
by_rows = T,
auto_convert_data_frames = T)
which is supposed to calculate the distance between each time series in test
matrix with all time series in the train
matrix (each row being a time series).
My custom function is:
customDTW <- function(ts1, ts2){
d <- dtw(ts1, ts2,
dist.method = "Euclidean",
window.type = "sakoechiba",
window.size = 20
)
return(d$distance)
}
The problem is that, comparing to when I use method="DTW"
, or even to the case where I calculate the distance matrix by myself, this is extremely slower, and as the length of the time series or the number of them grows, it get slower exponentially. Of course this is rooted in the nested loop, but I am surprised by the scale of the effect. There must be another reason that I am not seeing it.
My question is that how else I could implement my customDTW
to make it faster, using proxy::dist
?
This is my little experiment on the execution time:
Execution time for 60X7
(using proxy::dist
+ customDTW
)
user system elapsed
2.852 0.012 2.867
Execution time for 60X70
(using proxy::dist
+ customDTW
)
user system elapsed
5.384 0.000 5.382
Execution time for 60X700
(using proxy::dist
+ customDTW
)
user system elapsed
509.088 18.652 529.115
Execution time for 60X700
(without using proxy::dist
)
user system elapsed
26.696 0.004 26.753