I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results.
First clustering results:
This is all very well, and with 4 clusters I obviously get 4 labels associated to each apartment - 0, 1, 2 and 3. Using the random_state
parameter of KMeans
method, I can fix the seed in which the centroids are randomly initialized, so consistently I get the same labels attributed to the same apartments.
However, as this specific case is in regards of energy consumption, a measurable classification between the highest and the lowest consumers can be performed. I would like, thus, to assign the label 0 to the apartments with lowest consumption level, label 1 to apartments that consume a bit more and so on.
As of now, my labels are [2 1 3 0], or ["black", "green", "blue", "red"]; I would like them to be [0 1 2 3] or ["red", "green", "black", "blue"]. How should I proceed to do so, while still keeping the centroid initialization random (with fixed seed)?
Thank you very much for the help!