1

I would like to calculate the Hamming distance between two strings of variable length in Matlab. For fixed length strings the following syntax solves my problem:

str1 = 'abcde';
str2 = 'abedc';

sum(str1 ~= str2)

ans = 2

How can I do this efficiently for variable length strings?

Thank you!

EDIT: Because this is a legitimate question: For every character one string is longer then the other, the Hamming distance should be incremented. So for example for

str1 = 'abcdef';
str2 = 'abc';

The answer should be 3.

smonsays
  • 400
  • 2
  • 17
  • Your code works for variable length strings. Or do you mean one string can be longer than the other? If so, how do you define Hamming distance for that case? – Luis Mendo Mar 27 '17 at 17:23
  • Good question, I'll add an explanation to the question. – smonsays Mar 27 '17 at 19:10

2 Answers2

2

Here's a way to do it:

str1 = 'abcdef';
str2 = 'abc';
clear t
t(1,:) = str1+1; % +1 to make sure there are no zeros
t(2,1:numel(str2)) = str2+1; % if needed, this right-pads with zero or causes t to grow
result = sum(t(1,:)~=t(2,:));
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
1

although @LuisMendo answer works for the given example (which might be good enough for you) it will not work for this one:

str1 = 'abcdef';
str2 = 'bcd';
clear t
t(1,:) = str1+1; % +1 to make sure there are no zeros
t(2,1:numel(str2)) = str2+1; % if needed, this right-pads with zero or causes t to grow
result = sum(t(1,:)~=t(2,:)) % result = 6

to make sure that even if the shorter string appears in the middle of the longer one you should check all options. one way to do that is:

str1 = 'bcd';
str2 = 'abcdef';
len1 = length(str1);
len2 = length(str2);
n = len2 - len1;
str1rep_temp = repmat(str1,[1,n+1]);
str1rep = -ones(n+1,len2);
str1rows = repmat(1:n+1,[len1,1]);
str1cols = bsxfun(@plus,(1:len1)',0:n);
str1idxs = sub2ind(size(str1rep),str1rows(:),str1cols(:));
str1rep(str1idxs) = str1rep_temp;
str2rep = double(repmat(str2,[n+1, 1]));
res = min(sum(str1rep ~= str2rep,2)); % res = 3
user2999345
  • 4,195
  • 1
  • 13
  • 20
  • Ok, I didn't even think of doing it like this, but this is an interesting approach anyways. Maybe I will try that later on, thank you! – smonsays Mar 28 '17 at 17:54