We all know that the Hamming distance of two binary strings is the number of different bits. While for two binary strings:1110 and 1101, if I want to discribe their similarity with the number of same bits from the highest bit. (In this example, from left to right, count the bits until the two bits are different, then the result is 2.) Is this kind of similarity has been defined or has a formal name?
-
Isn't this just `floor(log2(a - b))` (or similar)? – Oliver Charlesworth Feb 09 '14 at 10:09
-
@OliCharlesworth: The formula for computing that distance probably looks like that, but I think the question is rather whether it has any *name*. Say, something like *Charlesworth Distance* or the like ;-) – O. R. Mapper Feb 09 '14 at 10:13
-
This question appears to be off-topic because it is about names of things, rather than programming. – Oliver Charlesworth Feb 09 '14 at 10:14
-
@OliCharlesworth: Isn't the name of an algorithm/programming technique very much in scope for programming questions? Or, differently asked, what would be a better fitting place to ask this question? Similar questions such as [this](http://stackoverflow.com/questions/4053152/whats-the-name-of-this-algorithm-routine) or [that](http://stackoverflow.com/questions/5353557/what-is-the-name-of-this-sorting-algorithm) did not receive any close votes on SO, either. *Programming* is not just writing the code, but also includes things such as using, knowing and recognizing design patterns and algorithms. – O. R. Mapper Feb 09 '14 at 10:15
-
1Thanks for giving the comments. I ask this question because I want to use this distance measure in my algorithm. I want to find more theoretical support and learn more things about it. (you know, the name is useful for search.) – firefly Feb 09 '14 at 10:49
1 Answers
I consulted several of the other faculty at my university and we agree, we've not heard of this :-)
However, these kinds of problems are always interesting, particularly when I've not seen them before... so I've been working on a solution.
As a point of clarification, I am taking your goal to be to find the distance (which I will call Confer distance... hey why not?... I loved O.R. Mapper's comment) between the binary values of two numbers of equivalent storage length (say two unsigned longs), and you're ignoring all the leading 0s. For example, the unsigned shorts 54090 vs. 3374... 54090 = 1101_0011_0100_1010 and 3374 = 0000_1101_0010_1110. Once you find the highest order 1 (the leftmost), they have the bit pattern 110_1001 matching before the first discrepancy, so the distance is 7.
Below is a C++ program I wrote to find this distance metric. The functions "find_highest_1" and "confer_dist" are the pertinent ones. Change the #define for T to be any unsigned type, but be warned, if you choose unsigned char, the unimportant and miserably written number inputing code will not work as you might expect, but the distance calculations will :-P
#include <iostream>
using namespace std;
/* the type chosen for T MUST be unsigned, but any size is fine */
#define T unsigned short
#define T_BITS (sizeof(T) * 8)
string print_bin(T num) {
string result = "0b";
for(int i = T_BITS - 1; i >= 0; i--) {
if((i + 1) % 4 == 0) result.append("_");
result.append(to_string((num & (((T)1) << i)) >> i));
}
return result;
}
int find_highest_1(T num) {
int i = -1; // -1 matters here because of how the Confer Distance is found
if(num != 0) {
i = 0;
for(int shift = T_BITS / 2; shift >= 1; shift >>= 1) {
if(num & (~(T)0) << shift) {
num >>= shift;
i += shift;
}
}
}
return i;
}
int confer_dist(T a, T b) {
int len_a = find_highest_1(a) + 1;
int len_b = find_highest_1(b) + 1;
int min_length;
min_length = (len_a < len_b) ? len_a : len_b;
a >>= len_a - min_length;
b >>= len_b - min_length;
return min_length - find_highest_1(a ^ b) - 1;
}
int main(int argc, const char * argv[])
{
T num1, num2;
cout << "enter two numbers: ";
cin >> num1 >> num2;
cout << "num1 = " << print_bin(num1) << endl;
cout << "num2 = " << print_bin(num2) << endl;
cout << "Confer dist: " << confer_dist(num1, num2) << endl;
return 0;
}
I didn't comment this to explain how/why it works, but I'd be happy to if it will benefit anyone.

- 286
- 2
- 6
-
Thanks for your detailed reply. I give this question because I think this kind of distance may be used in the binary tree. If the binary code is just the path from the root to the leaf, this distance may be defined as the affinity between two leaves (or there has been some similar methods to define this?). :) – firefly Mar 09 '14 at 09:23