2

In R language is there a predict function in clustering like the way we have in classification? What can we conclude from the clustering graph result that we get from R, other that comparing two clusters?

user904522
  • 21
  • 1
  • 3
  • 1
    Quoting from the text behind the "clustering" tag: Clustering has 2 meanings; please use the tag [Computer-clustering ](http://en.wikipedia.org/wiki/Computer_cluster) or [Data-clustering ](http://en.wikipedia.org/wiki/Data_clustering), in addition to Clustering. For data-clustering, giving sizes -- Ndata, Ndimension, Ncluster -- will help people to give better answers. – Hot Licks Nov 13 '11 at 14:25
  • 1
    You need to specify which functions you have been using. If this a very general question then you should probably be going to the CRAN Task View: http://cran.r-project.org/web/views/Cluster.html At the moment the question is far to general to be answered and should probably be closed. – IRTFM Nov 13 '11 at 18:07

2 Answers2

2

Clustering does not pay attention to prediction capabilities. It just tries to find objects that seem to be related. That is why there is no "predict" function for clustering results.

However, in many situations, learning classifiers based on the clusters offers an improved performance. For this, you essentially train a classifier to assign the object to the appropriate cluster, then classify it using a classifier trained only on examples from this cluster. When the cluster is pure, you can even skip this second step.

The reason is the following: there may be multiple types that are classified with the same label. Training a classifier on the full data set may be hard, because it will try to learn both clusters at the same time. Splitting the class into two groups, and training a separate classifier for each, can make the task significantly easier.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
1

Many packages offer predict methods for cluster object. One of such examples is clue, with cl_predict.

The best practice when doing this is applying the same rules used to cluster training data. For example, in Kernel K-Means you should compute the kernel distance between your data point and the cluster centers. The minimum determines cluster assignment (see here for example). In Spectral Clustering you should project your data point dissimilarity into the eigenfunctions of the training data, compare the euclidean distance to K-Means centers in that space, and a minimum should determine your cluster assignment (see here for example).

Community
  • 1
  • 1
catastrophic-failure
  • 3,759
  • 1
  • 24
  • 43