Since this must be quite a common scenario I'm wondering whether there are any pre-existing solutions to the following.
Lets say I have a set of N strings, and I am computing distances the distances between them. In this case it's a Hamming distance, but that's not really important.
If I wanted to make this as quick as possible, I would avoid self comparisons like so:
def hamming_distance(string1, string2):
"""Return the Hamming distance between equal-length sequences"""
if len(string1) != len(string2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
ratios=[]
for a, i in enumerate(string_list):
for b, j in enumerate(string_list):
if a == b: # Avoid self comparisons for speed
break
ratios.append(hamming_distance(string_list[i], string_list[j]))
return ratios
But since this is 'symmetric', I could also throw away any reciprocal comparisons which would increase the speed if the strings were numerous and/or large.
Is there a generally accepted/elegant way of doing this in the above set up?
I'm also aware that in general it's advised to avoid nested loops as they can be slow - so if there is a better way of achieving this pairwise iteration over lists (maybe something in collections
?) and incorporating the avoidance of self and reciprocal comparisons, I'm all ears.