1

I am trying to visualize data with t-SNE from the yellowbrick package. And I am getting an error.

import pandas as pd
from yellowbrick.text import TSNEVisualizer
from sklearn.datasets import make_classification

## produce random data
X, y = make_classification(n_samples=200, n_features=100,
                       n_informative=20, n_redundant=10,
                       n_classes=3, random_state=42)

## visualize data with t-SNE
tsne = TSNEVisualizer()
tsne.fit(X, y)
tsne.poof()

The error (raised by the fit method):

ValueError: The truth value of an array with more than one element
             is ambiguous. Use a.any() or a.all()
galapah
  • 379
  • 1
  • 2
  • 14
  • sorry, this is an issue that has come up since the release of numpy 1.13 and will be fixed in the 0.6 version of Yellowbrick. I've opened up an issue for it [here](https://github.com/DistrictDataLabs/yellowbrick/issues/323). – bbengfort Mar 05 '18 at 17:01

1 Answers1

2

After some experimenting with the arguments:

tsne.fit(X, y.tolist())

This raises no error, but produces no output.

Finally, replacing with a list of strings works:

y_series = pd.Series(y, dtype="category")
y_series.cat.categories = ["a", "b", "c"]
y_list = y_series.values.tolist()

tsne.fit(X, y_list)
tsne.poof()

The library is intended for analyzing text datasets, perhaps that is why it is not documented that y needs to be strings. Furthermore, the error message is not helpful.

galapah
  • 379
  • 1
  • 2
  • 14
  • 1
    This is a good interim solution; it's not that y needs to be strings, it's just that the visualizer is trying to figure out the class names. Unfortunately, the way numpy arrays are evaluated for falseyness has changed, which led to this error. As I mentioned above, this will be updated in the 0.6 release of yellowbrick, which is coming soon. – bbengfort Mar 05 '18 at 17:03