k means clustering result storing for later use

Question

I am exploring r programming environment for performing clustering analysis on my test data. For testing I am using a single column data set with the following scatter plot and histogram plotted against the value index.

enter image description here

enter image description here From the data I feel the values can be partitioned into 7 clusters. And when I use kmeans function with no of clusters argument as 7, I get the following result.

Within cluster sum of squares by cluster: [1] 492.480 2979.013 1903.396 18682.262 1430.533 754221.504 (between_SS / total_SS = 98.3 %)

Now my doubt is how to store this result( not necessarily in r) so that when I get a new data set I should be able to compare the i/p data set with already stored clustering result . I should be able to partition the i/p data set values into the already known clusters.

Why are you trying to "cluster" data that's already in 7 discrete values?? — Señor O, Mar 13 '15 at 15:01
@SeñorO eight actually ;-) but i agree the example presented is not very really helpful. — agenis, Mar 13 '15 at 15:06
Senor this is a test data set. The real data would be denser. — Soumajit, Mar 13 '15 at 15:06
if your purpose is to validate the stability of your clustering with kmeans, i suggest the clValid package (see the documentation) — agenis, Mar 13 '15 at 15:18
You could also try the mlr package, see [the tutorial](http://berndbischl.github.io/mlr/tutorial/html/predict/index.html). — Lars Kotthoff, Mar 13 '15 at 15:21

score 1 · Answer 1 · answered Mar 13 '15 at 15:25

1

Examine the Value section of help(kmeans). The centers will tell you where the center of the mean is. For incoming data, compute which center it is closest to. Example:

data(mtcars)
mt.k <- kmeans(mtcars, centers = 4)
mt.k$centers

answered Mar 13 '15 at 15:25

vpipkt

1,710
14
17

score 1 · Answer 2 · answered Mar 13 '15 at 15:25

It's not immediately obvious how to deal with kmeans objects. The easiest thing to do is attach it to your dataframe:

 k = kmeans(data, centers = 7)
 data = k$cluster

Now you have the cluster number as a column in the data.frame. Save however you'd save a data.frame.

k means clustering result storing for later use

2 Answers2