-2

Having difficulty understanding the following code in Matlab to calculate Euclidean distance between two points, where X is the data to be classified and label corresponds to cluster membership.

label = ones(1, data_dim);
[N,~]=size(X);
[c,~]=size(clusters);
dist = zeros(N,c);
for i = 1:c
    dist(:,i) = sum(bsxfun(@minus, X, clusters(i,:)).^2, 2);
end

[~,label] = min(dist,[],2);

Can anyone explain what is going on here and maybe explain it from first principles without using bsxfun?

Papantonia
  • 111
  • 10

1 Answers1

1
  • The for loop iterates over every row in clusters. Each row is presumably the coordinates of a point in that cluster.
  • bsxfun(@minus, X, clusters(i,:) subtracts that particular cluster row from every row in X. In other words, it outputs a matrix where the first row is X(1,:) - clusters(i,:), the second row is X(2,:) - clusters(i,:), etc. This is like a direction vector from each point in X to the particular cluster point i.
  • Every value is squared (.^2) and then these are summed along each row (sum(...,2)) . That gives you a column vector containing the square of the Euclidean distance from every point in X to the cluster point. This is stored in the matrix 'dist' which therefore contains the square of the distance from every point in X to each point in clusters
  • the min(dist,[],2) command finds the minimum of these values over each column of dist, i.e. it find the minimum distance for each point in x. However, the actual value is ignored and instead the index is stored in label - this index corresponds to the cluster which had the minimum distance
Tom
  • 1,095
  • 8
  • 13