I have one column that corresponds to the item and the following columns correspond to timestamps. In every column corresponding to the timestamps we have the number of sales of each item. This is just an example of my dataframe. I have hundreds of rows and hundreds of timestamp columns.
d = {'item': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'2019-07-25 17:00:00': [0, 2, 3, 5, 6, 7, 0, 1, 9 , 10],
'2019-07-26 8:00:00': [0, 2, 3, 0, 3, 5, 0, 1, 9 , 10],
'2019-07-26 16:00:00': [0, 1, 3, 5, 6, 7, 0, 2, 9 , 1],
'2019-07-27 21:00:00': [0, 2, 3, 5, 3, 7, 0, 1, 4 , 10]}
df = pd.DataFrame(d)
df
After this I created a train and test dataset and applied the kshape algorithm
from tslearn.utils import to_time_series_dataset
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from sklearn.model_selection import train_test_split
from tslearn.clustering import KShape
from sklearn.metrics import adjusted_rand_score
data_train = df.iloc[:3,:]
data_test = df.iloc[:3,:]
data_joined = np.concatenate((data_train, data_test), axis = 0)
# separate by train and test data
data_train, data_test = train_test_split(data_joined, test_size = 0.2, random_state = 888)
# transform to timeseries
X_train = to_time_series_dataset(data_train[:, 1:])
X_test = to_time_series_dataset(data_test[:, 1:])
# y train and y test
y_train = data_train[:, 0].astype(np.int)
y_test = data_test[:, 0].astype(np.int)
# scale X_train and X_test
X_train = TimeSeriesScalerMeanVariance(mu=0, std = 1).fit_transform(X_train)
X_test = TimeSeriesScalerMeanVariance(mu=0, std = 1).fit_transform(X_test)
# applied the algorithm
ks = KShape(n_clusters = 3, max_iter = 100, n_init = 100, verbose = 0, random_state = 888)
# fitted the algorithm
ks.fit(X_train)
preds = ks.predict(X_train)
# get the adjusted_rand_score
adjusted_rand_score(y_train, preds)
But the adjusted rand score was 0. What am I doing something wrong?