0

I'm currently using RStudio for doing text mining on Support tickets, clustering them by their description (freetext). For this, I compare kmeans to EM algorithm. I prepared the data with the tm package, and now I try do apply clustering algorithms to the data matrix.

With the kmeans() function, I can use following Code snippet to Output the 5 most frequent Terms in text Clusters (kmeans21):

> for (i in 1:num_cluster) {
     cat(paste("cluster ", i, ": ", sep = ""))
     s <- sort(kmeans21$centers[i, ], decreasing = T)
     cat(names(s)[1:5], "\n")
 }

Until now, I couldnt find a function to do the same within the mclust package. My data has the following Format:

> bic21 <- MclustBIC(m1, G=21)
> emmodel21 <- summary(bic21, data = m1)

With the command

> emmodel21$classification

I can see the Cluster for each supportticket, but is there also the possibility to Output the most frequent Terms like in the first Code block for kmeans?

Ben
  • 35
  • 5

2 Answers2

0

I think you can try

summary(mod1, parameters = TRUE)

Just tried the same example in the link

library(mclust)
data(diabetes)
X <- diabetes[,-1]
BIC <- mclustBIC(X)
mod1 <- Mclust(X, x = BIC)
summary(mod1, parameters = TRUE)
Alexandre Gentil
  • 149
  • 1
  • 12
0

Slightly altering the first example in the vignette:

data(diabetes)
X <- diabetes[,-1]
mod <- mclust(X)
means <- mod$parameters$means

The means object is now a matrix containing the means of the clusters.

user42909
  • 101
  • 1