1

I have a fulltext index on a binary field that stores superscript and subscript numbers. When searching the field, the same results are returned whether you add super/sub script numbers to your search or not.

Some details on how this is structured: I have a table where I am storing HTML but the text within it needs to be searchable (so when searching for "Text", <b>T</b>ext will match the search as well as Text, therefore as well as a nvarchar(max) column with the HTML stored as text, I also have a varbinary(max) that stores the HTML converted into binary with a fulltext index against it (as described here http://blog.leifbattermann.de/2016/09/23/howto-implement-full-text-search-html-documents-microsoft-sql-server).

The HTML includes some scientific formulas, such as electronic structures that use superscript and subscript numbers (like this: Ar 3d¹⁰ 4s² 4p²). The super/sub script numbers are entered by the user as text by pasting from character map etc (rather than using HTML encoding). If I query sys.dm_fts_index_keywords it looks like this doesn't store the super/sub script, so it's not a surprise they can't be found. However it does store other unicode characters, and other super/sub script characters, it just seems to be super/sub script numbers. In the above example, WHERE FREETEXT (HTMLTextBin, '4p²') and WHERE FREETEXT (HTMLTextBin, '4p') both return the record, WHERE FREETEXT (HTMLTextBin, '4p2') does not. I would expect only the first one to match.

Does anyone know of a setting that I'm missing? Or another way of being able to search that would inlcude these characters?

Many thanks in advance

0 Answers0