0

The following is a problem from an assignment that I am trying to solve:

Visualization of similarity matrix. Represent every sample with a four-dimension vector (sepal length, sepal width, petal length, petal width). For every two samples, compute their pair-wise similarity. You may do so using the Euclidean distance or other metrics. This leads to a similarity matrix where the element (i,j) stores the similarity between samples i and j. Please sort all samples so that samples from the same category appear together. Visualize the matrix using the function imagesc() or any other function.

Here is the code I have written so far:

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array

% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix

Now, I think I've got the coding mostly right to answer the question. My issue is how to sort all the samples so that samples from the same category appear together because I got rid of the names when I created the copy. Is it already sorted by converting to squareform? Other suggestions? Thank you!

MatthewS
  • 455
  • 2
  • 7
  • 22

1 Answers1

0

It should be in the same order as the original data. While you could sort it afterwards, the easiest solution is to actually sort your data by class after line 2 and before line 3.

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table

Consider using sortrows:

tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. Row names of a table label the rows along the first dimension of the table. If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA.

timbo
  • 1,533
  • 1
  • 15
  • 26
  • The original data is already sorted by class, so I'm thinking it's already sorted. – MatthewS Mar 02 '19 at 01:39
  • If that's the case, then it should be. It's easy to check a few though. Just calculate a few Euclidean distances by hand and make sure they pop up in the table where they should. It should work fine. – timbo Mar 02 '19 at 06:46