How does sklearn.manifold.MDS find point locations when given a precomputed dissimilarity matrix?

Question

I am using sklearn.manifold.MDS to find a MDS solution using pre-computed (non-Euclidean) distances (i.e., I use the attribute "dissimilarity='precomputed'" and call the package with my own dissimilarity matrix). According to the API, the dissimilarity matrix is passed on directly to fit and fit_transform instead of the package finding the best fitting solution based on pairwise Euclidean distances (the default setting).

However, when I look at the source code for fit and fit_transform it looks like the dissimilarities are passed on as the disparities (actual pairwise distances, or distances in the data) but that the distances fitted to the disparities are still Euclidean distances found by the SMACOF algorithm (starting from a random configuration of data points, the point locations are selected that have the smallest deviation (stress) between the actual distances (disparities -- here the dissimilarity matrix) and the Euclidean pairwise distances between the points found by the algorithm). In other words, when using "dissimilarity='euclidean'", the package computes pairwise Euclidean distances from the data (disparities) and fits point locations to these disparities using Euclidean distances between the point locations. However, when using "dissimilarity='precomputed'", the package uses user-defined disparities (dissimilarity matrix) and fits point locations to these disparities using Euclidean distances between the point locations.

Is this understanding correct? If yes, is there any way to make the package find the best fitting embedding (point locations) or to visualize the dissimilarity matrix using some other distance metric? Or is the only option to use some other package (if any), or to re-code relevant aspects of the package?

Many thanks for your help.

What I did and need:

I called sklearn.manifold.MDS with different kinds of precomputed dissimilarity matrices and read the source code. I expected the package to find optimal point locations directly based on the dissimilarity matrix but it appears that the package fits pairwise Euclidean distances to the pairwise dissimilarities. I want to know whether this understanding is correct and whether there would be a way to find best fitting point locations or to visualize the dissimilarity matrix without using pairwise Euclidean distances (i.e., to use the same distance metric in the fitting process that was used for computing the dissimilarity matrix).

How does sklearn.manifold.MDS find point locations when given a precomputed dissimilarity matrix?

0 Answers0