0

I am pretty new to Matlab, now i want to use the matlab to do some clustering job. if I have 3 columns values

id1 id2 distvalue1

id1 id3 distvalue2 ....

id2 id4 distvalue i .....

5000 ids in total, but some ids pairs are missing the distance value in python I can make loops to import these distance value into a matrix form. How I can do it in matlab? and also let the matlab knows id1,...idx are identifies and the third column is the value

Thanks!

user1830108
  • 195
  • 1
  • 15
  • 1
    Where are these values, if they're not already in a matrix? A file? A database? And what do you mean by "let MATLAB know" they're identitifiers? How are you intending to access the data? – wakjah May 31 '13 at 20:52
  • It was in a file, but I can importdata('filename') and slice them into a n*3 matrix to matlab but i want to transfer them into a distance matrix like for clustering while the 1st and 2nd-D data only be the id. – user1830108 May 31 '13 at 20:59
  • OK. It's still not clear what your specific problem is... – wakjah May 31 '13 at 21:16
  • it as simple as ... How to transfer the third column (vector) into a distance matrix. – user1830108 May 31 '13 at 21:34
  • Assuming your data is in matrix `x`, how about `x(:, 3)`? – wakjah May 31 '13 at 21:41

1 Answers1

0

Based on the comments, you know how to get the data into the form of an N x 3 matrix, called X, where X(:,1) is the first index, X(:,2) is the second index, and X(:,3) is the corresponding distance.

Let's assume that the indices (id1... idx) are arbitrary numeric labels.

So then we can do the following:

% First, build a list of all the unique indices    
indx = unique([X(:,1); X(:,2)]);
Nindx = length(indx);

% Second, initialize an empty connection matrix, C
C = zeros(Nindx, Nindx);  %or you could use NaN(Nindx, Nindx)

% Third, loop over the rows of X, and map them to points in the matrix C
for n = 1:size(X,1)
     row = find(X(n,1) == indx);
     col = find(X(n,2) == indx);
     C(row,col) = X(n,3);
end

This is not the most efficient method (that would be to remap the indices of X to the range [1... Nindx] in a vectorized manner), but it should be fine for 5000 ids.

If you end up dealing with very large numbers of unique indices, for which only very few of the index-pairs have assigned distance values, then you may want to look at using sparse matrices -- try help sparse -- instead of pre-allocating a large zero matrix.

cjh
  • 866
  • 4
  • 16