I am text mining a large database to create indicator variables which indicate the occurrence of certain phrases in a comments field of an observation. The comments were entered by technicians, so the terms used are always consistent.
However, there are some cases where the technicians misspelled a word, and so my grepl() function doesn't catch that the phrase (albeit mispelled) occurred in an observation. Ideally, I would like to be able to submit each word in a phrase to a function, which would return several common misspellings or typos of said word. Does such an R function exist?
With this, I could search for all possible combinations of these misspellings of the phrase in the comments field, and output that to another data frame. This way, I could look at each occurence on a case-by-case basis to determine if the phenomenon I am interested in was actually described by the technician.
I have Googled around, but have only found references to actual spell checking packages for R. What I am looking for is an "inverse" spell checker. Since the number of phrases I am looking for is relatively small, I would realistically be able to check for misspellings by hand; I just figured it would be nice to have this ability built into an R package for future text mining efforts.
Thank you for your time!