I have a fulltext index on a binary field that stores superscript and subscript numbers. When searching the field, the same results are returned whether you add super/sub script numbers to your search or not.
Some details on how this is structured:
I have a table where I am storing HTML but the text within it needs to be searchable (so when searching for "Text", <b>T</b>ext
will match the search as well as Text
, therefore as well as a nvarchar(max) column with the HTML stored as text, I also have a varbinary(max) that stores the HTML converted into binary with a fulltext index against it (as described here http://blog.leifbattermann.de/2016/09/23/howto-implement-full-text-search-html-documents-microsoft-sql-server).
The HTML includes some scientific formulas, such as electronic structures that use superscript and subscript numbers (like this: Ar 3d¹⁰ 4s² 4p²). The super/sub script numbers are entered by the user as text by pasting from character map etc (rather than using HTML encoding).
If I query sys.dm_fts_index_keywords
it looks like this doesn't store the super/sub script, so it's not a surprise they can't be found. However it does store other unicode characters, and other super/sub script characters, it just seems to be super/sub script numbers.
In the above example, WHERE FREETEXT (HTMLTextBin, '4p²')
and WHERE FREETEXT (HTMLTextBin, '4p')
both return the record, WHERE FREETEXT (HTMLTextBin, '4p2')
does not. I would expect only the first one to match.
Does anyone know of a setting that I'm missing? Or another way of being able to search that would inlcude these characters?
Many thanks in advance