I am working with Microsoft Cognitive Service's Language Understanding Service API, LUIS.ai.
Whenever text is parsed by LUIS, whitespace tokens are always inserted around punctuation.
This behavior is intentional, according to the documentation.
"English, French, Italian, Spanish: token breaks are inserted at any whitespace, and around any punctuation."
For my project, I need to preserve the original query string, without these tokens, as some entities trained for my model will include punctuation, and it's annoying and a bit hacky to strip the extra whitespace from the parsed entities.
Example of this behavior:
Is there a way to disable this? It would save quite a bit of effort.
Thanks!!