0

Using a FTS5 virtual table returns nothing for postfix searches.

It only can search for the entire word tokens, or for the prefixes of the word tokens if I append * to the search.

For example, it does not find qwerty.png row, if I search for werty.

CREATE TABLE IF NOT EXISTS files (name TEXT, id INTEGER);
INSERT INTO files (name, id) VALUES ('qwerty.png', 1),  ('asdfgh.png', 2);
CREATE VIRTUAL TABLE IF NOT EXISTS names USING FTS5(name);
INSERT INTO names (name) SELECT name FROM files;
SELECT *
FROM names
WHERE name MATCH 'werty';

It only works for prefix searches (qwerty, qwer*, qwe*, ...).

I can't use * at the start of the search (*werty), since it produces an error.


Is possibly to make the indexed text search working as if I would use

SELECT *
FROM names
WHERE name like '%wert%';

?

I just want to have the fast search for a substring without the full table scan.

KeyKi
  • 2,693
  • 3
  • 12
  • 24

1 Answers1

0

Perhaps try the experimental trigram tokenizer

When using the trigram tokenizer, a query or phrase token may match any sequence of characters within a row, not just a complete token.

DinoCoderSaurus
  • 6,110
  • 2
  • 10
  • 15
  • Yeah, it works exactly as I asked. Although, the virtual table size is doubled in comparison with the default `tokenize="unicode61"`. For example, the table with only text column _(and rowid)_ takes 55 MB, the virtual FTS5 table additionally takes 100 MB with `unicode61` tokenizer, or 200 MB with `trigram` tokenizer. – KeyKi Feb 07 '23 at 15:04