I'm trying to build a function that uses R's agrepl
for approximate matching.
I am using a regex pattern which from my perspective is not treated as regex.
I came to this conclusion by running following test in my REPL:
> patterns <- c("ha","^ha","ha$","^ha$","(^)ha","ha($)")
> sapply(patterns,agrepl,x="ha",max.distance=0L,fixed=FALSE)
ha ^ha ha$ ^ha$ (^)ha ha($)
TRUE TRUE TRUE TRUE FALSE FALSE
> sapply(patterns,grepl,x="ha",fixed=FALSE)
ha ^ha ha$ ^ha$ (^)ha ha($)
TRUE TRUE TRUE TRUE TRUE TRUE
I'm not that good at using regex but I'm pretty sure that all of my patterns should match "ha".
Assuming that I'm right and above behavior should not be happening, would you be able to propose another function/solution to match my patterns to "ha"?
To be more specific I need a fuzzy matcher that will help me find keywords in unstructured data.
UPDATE I should point out that the only reason why I', using regular expressions is because I am looking for keywords (matches with spaces around them). If I can ensure that "haha" will not match "ha" but "ha foo" will then regex is not necessary for this problem.