1

I have some misundestanding concerning to FTS, I'd be thanksful if someone will be able to help me.

GOAL: Full text search using MATCH function.

Problem: Unable to do search by extended ASCII characters like: '#¿®£$ and etc.

Details: There are three predefined tokenizers: simple, porter and unicode61. But all of these tokenizers recognize special symbols as separators, because the documentation says:

A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128.

Possible solution (bad one): There is a way to specify extra symbols which should be used as separators for tokens or otherwise as a part of token.

CREATE VIRTUAL TABLE text USING FTS4(column, tokenize=unicode61 "tokenchars='$%")

After that I can find words like: that's, doll$r, 60%40 and etc, because tokenizer doesn't split tokens by '$% symbols.

But it doesn't suit me because there are a lot of extended symbols in ASCII table and it's not such a good solution to list all of them.

The main question: What is the best solution to do search by special symbols.

Thanks a lot and feel free to ask for more details if need.

Community
  • 1
  • 1
AinisSK
  • 306
  • 1
  • 10
  • Write a custom tokenizer. You have to do it in C, but its doable. But don't expect a simple and good answer here, tokenizing is a very non-trivial problem. That's why the build in ones are so-so. – Gabe Sechan May 02 '17 at 20:37
  • @GabeSechan thanks for the answer. I've thought about writing a custom tokenizer, but honestly it seems to me that it'll take a long time. One more question: from your point of view is it good decision to use these features to do global search in an application? I'm a bit confused, there is an [article](https://developer.android.com/training/search/search.html#search) where says to use virtual tables for searching data, but what's about special symbols? I don't believe people don't use it, especially `'` symbol. – AinisSK May 03 '17 at 06:23
  • @GabeSechan This does not work in Android without compiling and shipping your own copy of the SQLite library. – CL. May 03 '17 at 08:00
  • @AinisSK What you quoted applies only to the `simple` tokenizer. – CL. May 03 '17 at 08:01
  • @CL. The documentation says: _As well as the "simple" tokenizer, the FTS source code features a tokenizer that uses the Porter Stemming algorithm._ And I suppose that the same for `unicode61`, because when I used it there were he same problems with special symbols. – AinisSK May 03 '17 at 08:07
  • @CL then skip your own sql. I'm doing it right now because the built in only supports fts 4, not 5. The sqlite people even have it precompiled with jni bindings for android – Gabe Sechan May 03 '17 at 12:41

0 Answers0