I am trying to identify regex patterns in text, but kwic() does not identify regex phrases that are longer than just one word. I tried to use phrase()
, but that did not work either.
To give you an example:
mycorpus = corpus(bla$`TEXT` )
foo = kwic(mycorpus, pattern = "\\bno\\b", window = 10, valuetype = "regex" ) #gives 1959 obs.
foo = kwic(mycorpus, pattern = "\\bno\\b\\s{0,5}\\w+", window = 10, valuetype = "regex" ) #gives 0 obs.
foo = kwic(mycorpus, pattern = "no\\sother", window = 10, valuetype = "regex" ) #gives 0 obs. even though it should find 3 phrases
even though there are multiple patterns in the text that should be identified.
Thanks for the help!