2

I am exploring r programming environment for performing clustering analysis on my test data. For testing I am using a single column data set with the following scatter plot and histogram plotted against the value index.

enter image description here

enter image description here From the data I feel the values can be partitioned into 7 clusters. And when I use kmeans function with no of clusters argument as 7, I get the following result.

Within cluster sum of squares by cluster: [1] 492.480 2979.013 1903.396 18682.262 1430.533 754221.504 (between_SS / total_SS = 98.3 %)

Now my doubt is how to store this result( not necessarily in r) so that when I get a new data set I should be able to compare the i/p data set with already stored clustering result . I should be able to partition the i/p data set values into the already known clusters.

Soumajit
  • 342
  • 2
  • 4
  • 16

2 Answers2

1

Examine the Value section of help(kmeans). The centers will tell you where the center of the mean is. For incoming data, compute which center it is closest to. Example:

data(mtcars)
mt.k <- kmeans(mtcars, centers = 4)
mt.k$centers
vpipkt
  • 1,710
  • 14
  • 17
1

It's not immediately obvious how to deal with kmeans objects. The easiest thing to do is attach it to your dataframe:

 k = kmeans(data, centers = 7)
 data = k$cluster

Now you have the cluster number as a column in the data.frame. Save however you'd save a data.frame.

Señor O
  • 17,049
  • 2
  • 45
  • 47