qdap::mgsub takes the following parameters:
mgsub(x, pattern, replacement)
Within library(tm) corpus transformation you can wrap non tm functions within content_transformer()
, e.g.
corpus <- tm_map(corpus, content_transformer(tolower))
Here is a data frame with some poorly spelt text:
df <- data.frame(
id = 1:2,
sometext = c("[cad] appls", "bannanas")
)
And here is a data frame with a custom lookup for misspelt words:
spldoc <- data.frame(
incorrects = c("appls", "bnnanas"),
corrects = c("apples", "bannanas")
)
Using mgsub outwith the context of corpus and content_transformer() I could just do this:
wrongs <- select(spldoc, incorrects)[,1] %>% paste0("\\b",.,"\\b") # prepend and append \\b to create word boundary regex
rights <- select(spldoc, corrects)[,1]
df$sometext <- mgsub(wrongs, rights, df$sometext, fixed = F)
But I can't see how I could write mgsub inside a function to pass to content_transformer()
what would my parameter for x be as in mgsub(x, pattern, replacement)?