1

In Python 3.6.3, I have a precomputed distance matrix D:

In[1]: D
Out[1]:
array([[0.00, 305923.00, 269966.00, 349816.00, 304120.00, 326591.00,
        341136.00, 254420.00, 228892.00, 344290.00],
       [305923.00, 0.00, 13901.00, 288851.00, 9496.00, 9793.00, 8863.00,
        11598.00, 388409.00, 9545.00],
       [269966.00, 13901.00, 0.00, 268908.00, 10595.00, 9649.00,
        11120.00, 10683.00, 468215.00, 12278.00],
       [349816.00, 288851.00, 268908.00, 0.00, 275312.00, 277246.00,
        285087.00, 267596.00, 309412.00, 293227.00],
       [304120.00, 9496.00, 10595.00, 275312.00, 0.00, 8569.00, 8765.00,
        10600.00, 418165.00, 8714.00],
       [326591.00, 9793.00, 9649.00, 277246.00, 8569.00, 0.00, 8473.00,
        9147.00, 464342.00, 8777.00],
       [341136.00, 8863.00, 11120.00, 285087.00, 8765.00, 8473.00, 0.00,
        9981.00, 476542.00, 7791.00],
       [254420.00, 11598.00, 10683.00, 267596.00, 10600.00, 9147.00,
        9981.00, 0.00, 331620.00, 9285.00],
       [228892.00, 388409.00, 468215.00, 309412.00, 418165.00, 464342.00,
        476542.00, 331620.00, 0.00, 516956.00],
       [344290.00, 9545.00, 12278.00, 293227.00, 8714.00, 8777.00,
        7791.00, 9285.00, 516956.00, 0.00]])

which I am trying to plot, in order to visualize the clusters. I am using sklearn.manifold.MDS() for this, following the first example given here:

from sklearn import manifold
mds=manifold.MDS(n_components=2, dissimilarity='precomputed')
X_r=mds.fit_transform(D) #returns the embedded coordinates in the D1, D2 space. The distances between points are from the distance matrix D.

### graph
import matplotlib.pyplot as plt
k=2 #the number of clusters
fig=plt.figure(figsize=(11,9))
ax=fig.add_subplot(1,1,1)
colors=('red','blue','green','yellow','k','grey')
for label,color in zip(range(k),colors):
    position=k==label
    ax.scatter(X_r[position,0],X_r[position,1],label="Cluster {0}".format(label),color=color)

ax.set_xlabel("Dimension 1", fontsize=14)
ax.set_ylabel("Dimension 2", fontsize=14)
ax.legend(loc="best",fontsize=14)
ax.set_title("MDS", fontsize=16)
plt.xlim(-300000,300000)
plt.ylim(-300000,300000)
plt.show()

However, my plot is empty due to position=k==label=False. I should be able to visualize the two clusters.

FaCoffee
  • 7,609
  • 28
  • 99
  • 174
  • 1
    MDS is not a clustering algorithm, and I didn't get how you are trying to use it. The example in the link you posted works because it has a **target** variable (a label, `y`), your code doesn't have one. – Qusai Alothman Sep 08 '18 at 23:43
  • Thanks! I was suspicious that this could be lacking. So, how do you suggest to create a target variable? Basically, this has to do with identifying the different clusters before the distance matrix is passed on to MDS. How about k-medoids? – FaCoffee Sep 09 '18 at 11:13
  • 1
    You don't need to create a target variable, clustering should do the job. Why not cluster the data before computing the distance matrix? Why do you need the distance matrix anyway? Some clustering algorithms compute that internally.. As for which one to choose, that entirely depends on your data and use case. – Qusai Alothman Sep 09 '18 at 11:52

0 Answers0