2

Package versions:

numpy: 1.13.3

sklearn: 0.19.0

scipy: 0.19.1

I have a dense matrix svd_matrix

svd_matrix.shape
>>> (30000,50)

I want to train this on the scikit TSNE implementation using 'cosine' as the metric

tsne = sklearn.manifold.TSNE(n_components=2, random_state=0, metric='cosine')
matrix_2d = tsne.fit_transform(svd_matrix)
>>> ValueError: Metric 'cosine' not valid for algorithm 'ball_tree'

However I get the error above. I was training this model last week and it worked fine, but I have updated my package versions in the meantime, but I don't think this would cause an issue? Anyway, package versions are above.

algorithm isn't an argument for manifold.TSNE, so I can't give 'brute' as an argument as suggested here

Can anyone suggest what is going wrong here and how I can fix this? Thank you

PyRsquared
  • 6,970
  • 11
  • 50
  • 86

1 Answers1

3

The code of TSNE (0.19!!!) seems to use BallTree in every case (but the precomputed one):

neighbors_method = 'ball_tree'
if (self.metric == 'precomputed'):
    neighbors_method = 'brute'
knn = NearestNeighbors(algorithm=neighbors_method, n_neighbors=k,
                               metric=self.metric)

Now what metrics are allowed within BallTree:

from sklearn.neighbors import BallTree
BallTree.valid_metrics
# ['seuclidean', 'hamming', 'dice', 'jaccard', 'matching', 'russellrao', 
  'euclidean', 'kulsinski', 'wminkowski', 'chebyshev', 'mahalanobis',
  'sokalmichener', 'rogerstanimoto', 'infinity', 'p', 'canberra',
  'haversine', 'sokalsneath', 'l1', 'minkowski', 'pyfunc', 'l2',
  'cityblock', 'braycurtis', 'manhattan']

TSNE's code-base is quite active and there were probably some heavy changes describing your observation and also the fact, that it's not checking the metric before going to work.

This pull-request seems to add support for cosine metric, by not using BallTree in this case! As this seems to be merged, i think it would work if you install sklearn from the current master-branch!

Edit: it actually works (as expected) in master-branch!

The following, which makes not much sense (just a demo), runs without any errors when installing sklearn from the current master-branch (e049b1d35fba9fa688d81a6511be38a73ae824cc; 17.10.2017).

from sklearn.datasets.samples_generator import make_blobs
from sklearn.manifold import TSNE

X, y = make_blobs(n_samples=10, centers=3, n_features=2,
              random_state=0)

tsne = TSNE(n_components=2, random_state=0, metric='cosine')
matrix_2d = tsne.fit_transform(X)
# OK!
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Thanks @sascha... although that still didn't seem to work even after install ing from the master branch and building it. Hopefully this get's addressed in the next version update. I can't understand why they would get rid of it? – PyRsquared Oct 17 '17 at 14:49
  • Probably not because they want to, but more as a side-effect of tough design-decisions (and ongoing development). I think there might be a switch from dense to sparse-matrices. This alone can be deadly for a lot of stuff (especially metrics destroying sparsity; but that's just an example not necessarily true here). – sascha Oct 17 '17 at 14:50
  • @killerT2333 Just fired up my virtual-machine. Your task (see my code) actually works when using sklearn from master-branch! – sascha Oct 17 '17 at 15:18
  • 1
    I had this problem using version 0.19.0, it is fixed after an upgrade to 0.19.1. – jkyh Dec 22 '17 at 23:57