I have a df with some text data e.g.
words <- data.frame(terms = c("qhick brown fox",
"tom dick harry",
"cats dgs",
"qhick black fox"))
I'm already able to subset based on any row that contains a spelling error:
library(qdap)
words[check_spelling(words$terms)$row,,drop=F]
But given I have a lot of text data I want to filter only on spelling errors that occur more frequently:
> sort(which(table(which_misspelled(toString(unique(words$terms)))) > 1), decreasing = T)
qhick
2
So I now know that that "qhick" is a common misspelling.
How could I then subset words based on this table? So only return rows that contain "qhick"?