I'm using K-mean clustering and I have no idea about the true labels of the data. I used PCA and I've got 4 clusters. However, the clusters seem to be imbalanced. I was wondering how I can fix the class imbalanced problem in this unsupervised learning task?
Asked
Active
Viewed 199 times
2
-
What makes you think the clusters should be balanced? – Chris Oct 29 '20 at 13:09
-
I think I have the mindset of supervised learning, in which class imbalanced is a must. I actually don't know if this should be the case in unsupervised learning... Can you make it a bit more clear for me ? – Kimia Gharib Oct 29 '20 at 13:27
-
If you plan on using these clusters as a feature in another model the class imbalance could be an issue, but for the sake of separating into the most distinct n groups kmeans doesn't care about balancing. If you had a house that cost $1m dollars and 500 that cost $100k the single expensive house should be in it's own cluster. – Chris Oct 29 '20 at 13:37
-
Thanks Chris, that makes sense. I actually like to use the clusters and feed it into a predictive model later on. Do you have any suggestions on how i should deal with the class imbalance in this scenario? – Kimia Gharib Oct 29 '20 at 19:49