I have some misundestanding concerning to FTS
, I'd be thanksful if someone will be able to help me.
GOAL: Full text search using MATCH
function.
Problem: Unable to do search by extended ASCII
characters like: '#¿®£$
and etc.
Details: There are three predefined tokenizers: simple
, porter
and unicode61
. But all of these tokenizers recognize special symbols as separators, because the documentation says:
A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128.
Possible solution (bad one): There is a way to specify extra symbols which should be used as separators for tokens or otherwise as a part of token.
CREATE VIRTUAL TABLE text USING FTS4(column, tokenize=unicode61 "tokenchars='$%")
After that I can find words like: that's
, doll$r
, 60%40
and etc, because tokenizer doesn't split tokens by '$%
symbols.
But it doesn't suit me because there are a lot of extended symbols in ASCII
table and it's not such a good solution to list all of them.
The main question: What is the best solution to do search by special symbols.
Thanks a lot and feel free to ask for more details if need.