I have 2 datasets that contain names and free text respectively. As there are lots of resources on matching similar text regardless of their sequence using fuzzy or TF-IDF e.g. Jayda Silva Todd, Todd Jayda Silva, Silva Todd Jayda. However, I am unsure how I can apply this technique to a free text field instead to extract any name match.
Names DataFrame:
S/N | Name |
---|---|
1 | Jayda Silva Todd |
2 | Kerys Felix |
3 | Beauden Ventura |
4 | Giorgia Fleming |
Free Text DataFrame:
Reference No | Name |
---|---|
1 | Lorem Ipsum is simply dummy text Felix Kerys of the printing and typesetting industry. |
2 | Contrary to popular belief, Lorem Ipsum is not simply random text. |
3 | This text will return results as well although there's a slight spelling error Jayda Silva Lorem ipsum dolor sit amet, consectetur adipiscing elit |
4 | It is a long established fact that a reader will be distracted by the readable content Beauden, Ventur of a page when looking at its layout. |
Expected Output (on Free Text DataFrame):
Reference No | Name | Expected Result (from Names Dataframe) |
---|---|---|
1 | Lorem Ipsum is simply dummy text Felix Kerys of the printing and typesetting industry. | Kerys Felix |
2 | Contrary to popular belief, Lorem Ipsum is not simply random text. | "empty" |
3 | This text will return results as well although there's a slight spelling error Jayda Silva Lorem ipsum dolor sit amet, consectetur adipiscing elit | Jayda Silva Todd |
4 | It is a long established fact that a reader will be distracted by the readable content Beauden, Ventur of a page when looking at its layout. | Beauden Ventura |