I currently know that there are many phonetic coding methods such as Soundex and Metaphone that can encode English words into codes that represent their pronunciation, but how to evaluate the encoded result? For example, two words, one word is encoded as ABC, and the other is encoded as ABD, so how do we quantify and define the pronunciation similarity of these two words based on the results of the two words encoded?
At present, the way that can be thought of is to evaluate the results of two words encoded by speech according to simple text similarity evaluation criteria such as editing distance, but this seems to violate the original intention of "pronunciation similarity", because we finally compare the encoded text similarity. So is there a better way to evaluate the pronunciation similarity of two words after phonetic coding? For example, the coding similarity of two words can be defined to 0-100, 100 most similar, and 0 is the least similar by some means.