I have a list of time series data which contain of 1977 customers data. Each of them show 17,544 data points (hourly data for 2 years). I try to identify their cluster number and group them into same clusters. This code below is my programing, where a variable name "list of list" is a list that converting a timeseries data values into list of individual customers.
from dtw import dtw
import numpy as np
# Define a custom distance function
def my_dist(x, y):
return np.abs(x - y)
#Compute the pairwise DTW distances`
distances = np.zeros((len(list_of_lists), len(list_of_lists)))
for i in range(len(list_of_lists)):
for j in range(i+1, len(list_of_lists)):
x = list_of_lists[i]
y = list_of_lists[j]
distance, *rest = dtw(x, y, dist=my_dist)
distances[i,j] = distance
distances[j,i] = distance
from sklearn.cluster import AgglomerativeClustering
# Perform hierarchical clustering on the distance matrix
clustering = AgglomerativeClustering(n_clusters=cluster_count, metric = None,
linkage='average')
labels = clustering.fit_predict(distances)
print(labels)
However, my programming consume a lot of computation time.
Thus, are there any way to create an programming which can minimize a computation time to complete my task ?