What characters does the standard tokenizer delimit on?

Question

I was wondering which characters are used to delimit a string for elastic search's standard tokenizer?

score 6 · Accepted Answer · answered Sep 23 '15 at 14:28

6

As per the documentation I believe this is the list of symbols/characters used for defining tokens: http://unicode.org/reports/tr29/#Default_Word_Boundaries

answered Sep 23 '15 at 14:28

Andrei Stefan

1 Answers1