I have a data frame with text
TERM
good morning
hello
morning good
you're welcome
hello
hi
I would like to filter out all duplicates and all with the same words but in different order. So that I get:
TERM
good morning
hello
you're welcome
hi
I know how to get the distance of two words with stringdist.
stringdist(stringOriginal,stringCompare,method=qgram)
But since I have very long data frames I don't want to loop through all entries.
How can I filter out the similar terms?
Thx Joerg