My Goal is to identify whether a given text
has a target
string in it, but i want to allow for typos / small derivations and extract the substring that "caused" the match (to use it for further text analysis).
Example:
target <- "target string"
text <- "the target strlng: Butter. this text i dont want to extract."
Desired Output:
I would like to have target strlng
as the Output, since ist very Close to the target (levenshtein distance of 1). And next i want to use target strlng
to extract the word Butter
(This part i have covered, i just add it to have a detailed spec).
What i tried:
Using adist did not work, since it compares two strings, not substrings.
Next i took a look at agrep
which seems very Close. I can have the Output, that my target was found, but not the substring
that "caused" the match.
I tried with value = TRUE
but it seems to work on Array Level. I think It is not possible for me to Switch to Array type, because i can not split by spaces (my target string might have spaces,...).
agrep(
pattern = target,
x = text,
value = TRUE
)