I am looking for an encoding which can encode every string into a unique number such that ->
- Every two strings which are similar must have values close to each other.
- Every two values which are close to each other must represent similar strings.
Similarity of strings would mean that a few substitutions in one string can form another string. No additions or deletions are considered.
The string can only have characters A, C, T and G (only four possibilities)
Things I have tried ->
Gray code -> It satisfies the second one but doesn't satisfy the first criteria. Two string which are similar need not means they have closer values in gray code.
Hamming Distance from a reference string -> Clearly if the hamming distance is the same it does not mean at all that the strings are similar, just that they are equally far from the reference. So it does not satisfy the second criteria.
Please suggest a method if you know any for this particular problem.