I am looking for ways to improve the accuracy of TF-IDF weighing scheme in string matching (similarity). The main issue is that TF-IDF is sensitive to typographical errors in stings, and most large datasets tend to have typos. I realised variants of edit distance (character-based similarity metrics---levienshtein, affine-gas, Jaro and Jaro-winkler) are suitable for computing similarity between strings where there are typographical errors, but not suitable when words are out of order in strings.
Hence I would like to use edit distance correcting ability to enhance the accuracy of TF-IDF.
Any ideas on how to address this challenge will be highly appreciated.
Thanks in advance.