Levenshtein distance is an approach for measuring the difference between words, but not so for phrases.
Is there a good distance metric for measuring differences between phrases?
For example, if phrase 1 is made of n words x1 x2 x_n, and phrase 2 is made of m words y1 y2 y_m. I'd think they should be fuzzy aligned by words, then the aligned words should have a score about how similar they are, and some kind of gap penalty should be applied for non aligned words. These positive scores and negative scores should be aggregated in some way. There seem to be some heuristics involved.
Is there an existing solution for measuring the similarity between phrases? Python is preferred but other solution is also fine. Thanks.