0

I am new to DTW and was trying to apply the same for a dataset with ~700,000 rows and 9 features. I have two arrays (matrix) of the form,

[
   [0 1 0 0 0 0 0 0 0],
   [0 0 0 0 1 0 0 0 0],
   ...
   [0 0 0 0 0 0 1 0 0],
   [0 0 1 0 0 0 0 0 0],
]

I have explored the the fastdtw and dtaidistance packages. 'fastdtw' is able to give an output distance for the above matrix in around 5 min. In addition, I am looking to visualize the results as well, and apply hierarchical clustering. I didn't find any function in fastdtw to visualize the path/results and for clustering.

dtaidistance does provide these functions, but it takes too long to run (I ran it for the same two series above, it was still running after 15-20 minutes). Is there any way to handle this? Or can I do clustering and visualization with the results of fastdtw?

I would really appreciate some help regarding this.

araina
  • 13
  • 6
  • While I can't help you with this specific problem, I'd like to point out that, if I understand your data correctly, and you have 700k series, hierarchical clustering would require (strictly speaking) a `700k x 700k` matrix, and even if you only save the lower triangular, that would require ~200GiB of RAM. I doubt you'll be able to manage that. – Alexis Feb 15 '19 at 22:01
  • @Alexis I do understand that, I intended clustering to be a different question. The clustering will be done on a small part of the file, not the entire thing. I was looking for a package that could handle a large file (like fastdtw), and at the same time provide functionality for visualization (like dtaidistance). I apologize if the description is a bit misleading. – araina Feb 15 '19 at 22:45
  • If you can use other languages, maybe [IncDTW](https://cran.r-project.org/web/packages/IncDTW/vignettes/Incremental_Dynamic_Time_Warping.html) could work. – Alexis Feb 16 '19 at 09:27

0 Answers0