I was just wondering if there is a way in cmeans function [in package e1071] to perform the clustering using the Mahalanobis distance?
Many thanks
I was just wondering if there is a way in cmeans function [in package e1071] to perform the clustering using the Mahalanobis distance?
Many thanks
The e1071
package does not have a mahalanobis option. However, you can look into the cluster
package and the fanny
function. As per the help page, it also computes a fuzzy clustering of the data into k-clusters. With this function, you can provide your own distance matrix.
So for mahalanobis distance, you can calculate your distance matrix with dist
and then run your clustering.
require(cluster)
set.seed(123)
x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
y <- dist(x, "mahalanobis")
fanny(y, k=2)
Given your understandable concerns over equivalence between the functions here is an example comparing them:
require(e1071)
cl<-cmeans(x,centers=2,iter.max=20,dist="euclidean",method="cmeans",m=2)
fl <- fanny(x, k=2, maxit=20, metric="SqEuclidean", memb.exp=2)
> head(cl$membership)
1 2
[1,] 0.9948729 0.005127121
[2,] 0.3647778 0.635222221
[3,] 0.9290126 0.070987385
[4,] 0.7588260 0.241174043
[5,] 0.9282550 0.071745007
[6,] 0.9599231 0.040076886
> head(fl$membership)
[,1] [,2]
[1,] 0.9948722 0.005127775
[2,] 0.3647890 0.635211040
[3,] 0.9290171 0.070982905
[4,] 0.7588304 0.241169649
[5,] 0.9282575 0.071742489
[6,] 0.9599221 0.040077878
Although not absolutely identical, you can see there are very close. You will also notice that fanny is specifying the squared euclidean distance which is what cmeans is doing. This equivalence is noted on the fanny help page ?fanny
under metric.