0

I am trying to visualize my dataset (which is stored as a Pandas DataFrame) using T-SNE with the following code:

N = 10000
df_subset = df.sample(n=N, random_state=1)
data_subset = df_subset.values
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(data_subset)

However, this gives the following error: "ValueError: setting an array element with a sequence." for the last line of the code above.

The variable data_subset looks like this: data_subset looks like this

I also tried df_subset.to_numpy() as well, which gave the same error. A single element of the array data_subset looks like this: single element of data_subset

robot
  • 13
  • 3
  • It seems like your data is mixed with all kinds of data structures. Your data_subset is an array of numbers, arrays, and tuples. What you need for the [`fit_transform`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE.fit_transform) is ndarray of shape (n_samples, n_features) or (n_samples, n_samples) with pure numbers (no deep data structures) – Yuchen Zhang Dec 06 '22 at 18:00
  • I am sorry but I am not able to convert my array into an simple array with no complex data structures, as you suggested. Can you give me any tips on how to do this? Until now, I am aware of functions like flatten(), but this isn't the function I need and keeps the complex structures inside an array as it is. I appreciate any tip! – robot Dec 07 '22 at 08:36
  • tsne is basically transforming ndarray to a 2d array. You have to decide which features you want to keep in your array. For example in your `df_subset.to_numpy()`, i believe it's 16d array. For each non-number dimension you need to somewhat decide how to convert them into numbers. Say index 4 (0-based) is an array, you have to reduce it into a number. Finally you will have [d1,d2,...d16] and tsne will reduce its dimension to [d1,d2] – Yuchen Zhang Dec 08 '22 at 01:06

0 Answers0