correct manual clusters so that to make them more relevant

Question

I have clusters. They are done manually. I know that machine learning algorithms aim to segment profiles efficiently, but I don't to loose the cluster I have. I just wanna change them slightly to make them relevant. Surely the result won't be as good as K-means output. Do you know any methods that go from existing clusters and try to optimize/correct them ? Many thanks

Say you cooked a meal (assume chicken soup), but you do not like the taste or its color. Now your asking passerby's on how to change your cooked meal? How on earth is the passerby (`read, SO community`) supposed to know what ingredients (`read variables or features`), you've used to cook (`read, the program code`) the meal? A very crude example, but the underlying message is, "The question is too broad". Try to provide more information on the variable/ feature data types or what do you mean by `relevant clusters`. — mnm, Jun 08 '18 at 12:28

score 0 · Answer 1 · answered Jun 09 '18 at 12:01

Automatic clusters tend to be worse for all practical purposes than anything you labeled manually.

So I don't think you need to "optimize" them.

But there are some obvious approaches:

For methods such as KMeans and PAM you can use your manual clusters as initial centroids. Just make sure they don't degrade...
There are constrained clustering algorithms, where you can use your existing labels as constraints, and have the clustering algorithm find the solution with the best agreement.

But don't overestimate clustering. It is very sensitive to parameters, preprocessing, normalization, ... - it's not that reliable.

correct manual clusters so that to make them more relevant

1 Answers1