My question builds upon the topic of matching a string against multiple patterns. One solution discussed here is to use sapply(keywords, grepl, strings, ignore.case=TRUE)
which yields a two-dimensional matrix.
However, I run into significant speed issues, when applying this approach to 5K+ keywords and 60K+ strings..(I cancelled the process after 12hrs).
One idea is to use hash tables, or environments in R. However, I don't get how "translate/convert" my strings into an environment while keeping the numerical index?
I have strings[1]
... till strings[60000]
e <- new.env(hash=TRUE)
for (i in 1:length(strings)) {
assign(x=i, value=strings, envir=e)
}
As x
in assign
must be a character, I can't use it like this, but I hope you get my idea..I want to be able to index the environment with the same numbers like in my string[...]
vector
Thanks for your help!