I am using the KWIC function in quanteda package in R to look up some phrases in Kurdish. In Kurdish, some compound words and phrases are separated by half-space. When I use a phrase including a half-space, R considers it as a typo(the red dot) and does not let me run the command. Is there a way to fix this?
The half-space or a zero-width non-joiner is used in some languages to avoid a ligature when normalizing a text. Its Unicode character is '\u200c' and in some text-editors, it can be shown on the screen with a SHIFT+SPACE.
kwic(cleantest, phrase("لهلایهنی"), window = 1)
Here is the image of the error
Also, do you know of a Sorani Kurdish POS Tagger and a Stemmer?