Why tsne method use Euclidean distance to compute the similarities in high dimensional data?

Question

I have tried other distance metrics like chebychev distance or Manhatten distance and so on, which are all implemented in tsne in Matlab. Some of them achieve the same good performance as Euclidean distance metric. So I have some questions about why tsne always use the Euclidean distance to calculate the distance. Is there any advantages for this distance metric compared with other distance metrics? I hope someone could help me solve my problem. Thanks in advance!

score 0 · Accepted Answer · answered Jun 26 '19 at 23:01

0

TSNE always uses the Euclidean distance function to measure distances because it is the default parameter set inside the method definition. If you wish to change the distance function being used for your particular problem, the 'metric' parameter is what you need to change inside your method call.

Here is a link that lists the different distance functions you can use as a parameter instead of Euclidean.

Hope this helps!

answered Jun 26 '19 at 23:01

Andrew

460
4
12

Thanks a lot. Sure, but I am curious about why tsne uses the Euclidean distance as default? Is it better than other distance metrics for TSNE? – CuishleChen Jun 27 '19 at 08:18
@JichenGuo I don't think it can be considered 'better' than the other distance metrics as it depends on the problem, but rather the most commonly used. – Andrew Jun 27 '19 at 20:49

score 0 · Answer 2 · answered Jun 30 '19 at 22:10

Not sure which implementation you are talking about, but in general tSNE works on a distance matrix, and it's up to you to calculate this distance matrix according to what actually makes sense for your data.

Euclidean and Jaccard distances generally work well, I got some nice results also by using TSVD to reduce the data to ~50 dimensions and then doing tSNE on Euclidean distance matrix.

Why tsne method use Euclidean distance to compute the similarities in high dimensional data?

2 Answers2