1

I have an SQLite database using FTS4. It is used to store emails with message id's of the form:

Searching for messages using the FTS MATCH syntax, I get a result from:

SELECT rowid FROM emails WHERE emails MATCH '<8200@comms.io>'

This returns the correct row. But when I try to find multiple emails, I get an empty response:

SELECT rowid FROM emails WHERE emails MATCH '<8200@comms.io> OR <8188@comms.io>'

Strangely though, I can search without the angle bracket characters. This returns both rows:

SELECT rowid FROM emails WHERE emails MATCH '8200@comms.io OR 8188@comms.io'

This even though the angle brackets are present in the stored columns. I can find no mention that these are special characters in SQLite, and without the 'OR', the single-term search works fine.

Why are these characters treated differently in my compound search?

Roderick
  • 1,205
  • 11
  • 24

1 Answers1

2

The default (simple) tokenizer reads alphanumerical characters and treats all others as word separators to be ignored. So when searching for a message ID, you have to actually search for a phrase with multiple words (8200, comms, and io).

If you want to treat the entire message ID as a word, you have to write a custom tokenizer.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • Thanks for pointing me towards the tokenizer: according to http://www.sqlite.org/fts3.html#tokenizer, unicode characters under 128 (<,>,@) are discarded. So possibly my search is transforming to '8200 AND comms AND io'. But this would happen in both cases, with the low-precedence OR in the compound search. So that should produce a result equivalent to '( 8200 AND comms AND io) OR ( 8188 AND comms AND io)', but nothing is returned. – Roderick Jan 01 '13 at 05:26
  • This is not strictly true. There is an undocumented feature in the simple tokenizer which allows the delimiters to be customized. I posted a question about this to the [SQLite mailing list](http://sqlite.org:8080/cgi-bin/mailman/private/sqlite-users/2014-September/055419.html). – npgall Sep 17 '14 at 17:11