-2

I have a set of n_samples data points. Each data point has n_features (of the order of hundreds or thousands of features). I use K-Means clustering and Euclidean distance to cluster the points into n_clusters. Then I use TSNE to convert my high dimensional input data X (which is n_samples x n_features) to X_low_dim (which is n_samples x 2) to visualize data in two dimensions. Do you know an easy way to draw distance contours from the center of clusters in Python?

user2725109
  • 2,286
  • 4
  • 27
  • 46
  • 1
    The reason you did not get an answer is probably rather that the question is too broad and pretty unclear. Opening a bounty on it, will not change that. Read [ask] and write a clear problem description. – ImportanceOfBeingErnest Jan 22 '18 at 17:30

2 Answers2

1

There is an ambiguity in your question: if you project your n-dimensional data onto 2-dimensional manifold, then each 2D point will correspond to multiple original points with different distances to a cluster center.

Thus, to have a unique value of distance in each 2D point, you have to use just a 2D grid and a simple Euclidean distance in it. It will be as similar to the original distance as possible, because T-SNE tries to do just this.

David Dale
  • 10,958
  • 44
  • 73
1

I don't know whether I misunderstood the question or others did, but if I got it correctly you want to plot contour plots having the projections of your cluster representatives at the center.
You can look here for a general approach to contour plots, but taking almost verbatim from that code you could do something like this:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm 
import scipy.stats as st

def contour_cloud(x, y, cmap):
    xmin, xmax = -10, 10
    ymin, ymax = -10, 10

    xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
    positions = np.vstack([xx.ravel(), yy.ravel()])
    values = np.vstack([x, y])
    kernel = st.gaussian_kde(values)
    f = np.reshape(kernel(positions).T, xx.shape)

    plt.contourf(xx, yy, f, cmap=cmap, alpha=0.5)

# Assuming to have 2 clusters, split the points into two subsets
representative_1 = ...  # Shape (2, )
cluster_1 = ...         # Shape (n_points_cl_1, 2)
representative_2 = ...  # Shape (2, )
cluster_2 = ...         # Shape (n_points_cl_2, 2)

plt.scatter(x=representative_1[0], y=representative_1[1], c='b')
plt.scatter(x=representative_2[0], y=representative_2[1], c='r')

contour_cloud(x=cluster_1[:, 0], y=cluster_1[:, 1], cmap=cm.Blues)
contour_cloud(x=cluster_2[:, 0], y=cluster_2[:, 1], cmap=cm.Reds)

plt.show()

Set xmin, xmax, ymin, and ymax accordingly to your data.

This will output something along these lines:

enter image description here

Try to play with the parameters to fit your needs, I threw this together in 5 minutes, so it's not really pretty. In the plot above, I sampled 1000 points from two different normal distributions and used their means ((0, 0) and (10, 10)) as representatives.

Daniele Grattarola
  • 1,517
  • 17
  • 25