Can I use t-SNE when the dimension is larger than the number of data?

Question

I am using t-SNE with the matlab code from this web site (https://lvdmaaten.github.io/tsne/). However, there is an error whenever I run this program with the data's dimension is larger than the number of data. The code below is the code I use currently and the error is always occurs here

M = M(:,ind(1:initial_dims));

the error is

Index exceeds matrix dimensions.
Error in tsne (line 62)
    M = M(:,ind(1:initial_dims));

I call this tsne function with the command in the matlab

output = tsne(input, [], 2, 640, 30);

The input size is (162x640), the dimension is 640 and the number of data is 162. The program below is the code from the website above.

function ydata = tsne(X, labels, no_dims, initial_dims, perplexity)
%TSNE Performs symmetric t-SNE on dataset X
%
%   mappedX = tsne(X, labels, no_dims, initial_dims, perplexity)
%   mappedX = tsne(X, labels, initial_solution, perplexity)
%
% The function performs symmetric t-SNE on the NxD dataset X to reduce its 
% dimensionality to no_dims dimensions (default = 2). The data is 
% preprocessed using PCA, reducing the dimensionality to initial_dims 
% dimensions (default = 30). Alternatively, an initial solution     obtained 
% from an other dimensionality reduction technique may be specified in 
% initial_solution. The perplexity of the Gaussian kernel that is     employed 
% can be specified through perplexity (default = 30). The labels of     the
% data are not used by t-SNE itself, however, they are used to color
% intermediate plots. Please provide an empty labels matrix [] if you
% don't want to plot results during the optimization.
% The low-dimensional data representation is returned in mappedX.
%
%
% (C) Laurens van der Maaten, 2010
% University of California, San Diego

if ~exist('labels', 'var')
    labels = [];
end
if ~exist('no_dims', 'var') || isempty(no_dims)
    no_dims = 2;
end
 if ~exist('initial_dims', 'var') || isempty(initial_dims)
    initial_dims = min(50, size(X, 2));
end
if ~exist('perplexity', 'var') || isempty(perplexity)
    perplexity = 30;
end

% First check whether we already have an initial solution
if numel(no_dims) > 1
    initial_solution = true;
    ydata = no_dims;
    no_dims = size(ydata, 2);
    perplexity = initial_dims;
else
    initial_solution = false;
end

% Normalize input data
X = X - min(X(:));
X = X / max(X(:));
X = bsxfun(@minus, X, mean(X, 1));

% Perform preprocessing using PCA
if ~initial_solution
    disp('Preprocessing data using PCA...');
    if size(X, 2) < size(X, 1)
        C = X' * X;
    else
        C = (1 / size(X, 1)) * (X * X');
    end
    [M, lambda] = eig(C);
    [lambda, ind] = sort(diag(lambda), 'descend');
    M = M(:,ind(1:initial_dims));
    lambda = lambda(1:initial_dims);
    if ~(size(X, 2) < size(X, 1))
        M = bsxfun(@times, X' * M, (1 ./ sqrt(size(X, 1) .* lambda))');
    end
    X = bsxfun(@minus, X, mean(X, 1)) * M;
    clear M lambda ind
end

% Compute pairwise distance matrix
sum_X = sum(X .^ 2, 2);
D = bsxfun(@plus, sum_X, bsxfun(@plus, sum_X', -2 * (X * X')));

% Compute joint probabilities
P = d2p(D, perplexity, 1e-5);                                           % compute affinities using fixed perplexity
clear D

% Run t-SNE
if initial_solution
    ydata = tsne_p(P, labels, ydata);
else
    ydata = tsne_p(P, labels, no_dims);
end

I am trying to understand this code but I cannot understand the part where the error occurs.

if size(X, 2) < size(X, 1)
    C = X' * X;
else
    C = (1 / size(X, 1)) * (X * X');
end

Why this condition is needed? Since the size of 'X' is (162x640), the else statement will be executed. I guess this is the problem. In the else statement, the size of 'C' will be (162x162). However, in the next line

M = M(:,ind(1:initial_dims));

the 'initial_dims' which equals to 640 is used. Did I used this code in a wrong way? Or is it just not available to the data set I use?

score 1 · Accepted Answer · answered Feb 29 '16 at 07:23

According to the document: The data is preprocessed using PCA, reducing the dimensionality to initial_dims dimensions (default = 30). So, you should leave this parameter unchanged at first time.

The condition if size(X, 2) < size(X, 1) is used to formulate the matrix for economy SVD, so that the size of covariance matrix will be smaller, which leads to faster computation.

Can I use t-SNE when the dimension is larger than the number of data?

1 Answers1