0

I have a precomputed distance matrix that I have run through Sklearn's MDS algorithm. All data are needed. The matrix has been scaled (0-1). I am looking to convert the analysis into a plot so the max n_components = 3.

I've tried to modify several parameters (n_components, random_state, n_init), however, I am unable to reduce the stress - 1 (normalized) value below 0.25, which is considered a 'poor' fit.

When I increase the n_components really high (n_components = 100), the stress score drops to 0.01. Would I be able to take these 100 dimensions and reduce them using PCA perhaps?

Any suggestions on how to improve the fit? Should I try a different tool instead?

Here is the code:

#Pre-computed distance matrix

df = pd.read_excel('./FTM_fingerprint_FULL_dissimilarity_matrix_MORGAN_1024_2.xlsx', index_col = 0, header = 0)

#Multidimensional scaling

mds1 = MDS(random_state = 1, dissimilarity = 'precomputed', n_init=16, n_components=3, eps=1e-9)

X_transform = mds1.fit_transform(df)

print(X_transform)

#normalized stress score

stress = mds1.stress_

print(stress)

Thanks

0 Answers0