6

I am using t-SNE python implementation for dimensionality reduction on X which contains 100 instances each described by 1024 parameters for cnn visualization.

X.shape = [100, 1024]

X.dtype = float32

When I run :

Y = tsne.tsne(X)

The first warning pops out in tsne.py, line 23 :

RuntimeWarning: divide by zero encountered in log H = Math.log(sumP) + beta * Math.sum(D * P) / sumP

Then there is a couple more warnings like this one on the following lines :

RuntimeWarning: invalid value encountered in divide

And finally I get this result after each iteration during the processing :

Iteration xyz : error is nan

The code ends without "errors" and I get an empty scatter plot at the end.

EDIT:

-> I have tried it with a different data set and it worked perfectly. However I would need it to work on my first set as well (the one that seems to cause problems)

Question :

Does anyone know what might be causing this? Is there a workaround?

Julep
  • 760
  • 1
  • 6
  • 18
  • 1
    The problem is that since it is using PCA at the beginning it requires at least the same amount of samples in your dataset as the number of features that you have. Are you able to get more samples for your dataset? – Salvador Medina Apr 25 '16 at 08:12
  • 1
    It is for a medical application so I cannot easily get more samples. I did a workaround by adding a small value (ex. 0.0001) to avoid a division by 0 at the line that caused the nan. – Julep May 07 '16 at 17:28
  • I'm getting the same error message, but I have thousands of samples (I've been trying subsets) and only 39 features... – arrey Jul 05 '16 at 18:32
  • Had "error is nan" issues using the t-sne implementation from the creator's website but not with the sklearn.manifold.TSNE one – FizBack Jul 14 '16 at 05:05

1 Answers1

6
sumP = sum(P)+np.finfo(np.double).eps
H = np.log(sumP) + beta * np.sum(D * P) / sumP;

This should fix the problem

  • I've just had the same problem. This worked perfectly - could somene explain what was causing the error and why this one line is fixing it? – arctic.queenolina Aug 20 '19 at 13:13
  • 1
    As the error says "divide by zero"; the problem is that in some case the algorithm divides the number by zero. "+np.finfo(np.double).eps" add a very small value to sum(P) to prevent the divide by zero error. @arctic.queenolina – Mohammad Javad Nov 18 '19 at 21:20