0

I want to run Full-Text Search on XML files that are stored as varbinary in SQL Server 2016 to find phrases (id, names, phones etc.) in those files. The problem is that the query doesn't find UTF-16 encoded files.

I have the following tables

Table Files:

Id Filename
1 utf8-encoded.xml
2 utf16-encoded.xml

Table FileContent:

Id Content Type
1 0x444ae9b... XML
2 0x422de56... XML

and a view:

CREATE VIEW Search 
WITH SCHEMABINDING 
AS 
    SELECT 
        e.Id, e.Filename, c.[Content], c.[Type]
    FROM 
        Files AS f 
    INNER JOIN 
        FileContent AS c ON c.Id = f.Id

Then I execute a full-text search query in C# in .NET Framework 4.6.1 using Dapper:

var searchTerm = "xxxxx"; // from user
var query = "SELECT * FROM Search s WHERE CONTAINS(s.Content, @searchTerm)";
var data = connection.Query<File>(query, new {searchTerm});
...

Changing Dapper parameter to sql-injectable string concatenation like this: WHERE CONTAINS(s.Content, " + searchTerm" + ")" doesn't change anything.

The query finds UTF-8 encoded XML files but not UTF-16 encoded. I believe that this is not Dapper-related issue.

How can I fix this to be able to find UTF-16 encoded XML files too? Does SQL Server's full-text search even work with UTF-16 encoded XML files?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
krzycho2
  • 1
  • 1
  • 1
  • 5
    fulltext works over textual columns. So, I suggest you try changing the column to be nvarchar(max) and try again. – Conor Cunningham MSFT Dec 13 '21 at 14:11
  • @ConorCunninghamMSFT It's a production database, I can't change this table. – krzycho2 Dec 13 '21 at 14:18
  • Don't know what's in your FileContent table but it doesn't seem to be XML data (neither UTF-8 nor UTF-16 encoded)... `0x444ae9b...` is `DJé...` and `0x422de56...` is `C-å...` – AlwaysLearning Dec 13 '21 at 15:18
  • It's random hex. In real Content table there is XML data, trust me :) – krzycho2 Dec 13 '21 at 15:25
  • binary fields don't have collations or distinguish between encodings (UTF-8 vs. -16 vs. ANSI, etc). Here's an example using filetable (where the blobs are stored on the file system). It won't help your prod system without changes, but hopefully it will give you a recipe to test future designs. https://oakdome.com/programming/SQL_FileTables_FullTextSearch.php – Conor Cunningham MSFT Dec 13 '21 at 15:53
  • Depending on what type of searching you are doing, you could possibly use XQuery – Charlieface Dec 13 '21 at 18:22

0 Answers0