1

I have a table of keywords in a SQL Server box that are currently stored with a full text index. I have been asked to add a facility to our searches that prioritise results where the result is closer to the start of the column.

CREATE TABLE [dbo].[ftsTest]
(
    [ID] [INT] NOT NULL,
    [Keywords] [NVARCHAR](MAX) NOT NULL,

    CONSTRAINT [PK_ftsTest] 
        PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

INSERT INTO [dbo].[ftsTest]
VALUES (1, N'Test Two Three Four Five Six Seven Eight Nine Ten'), 
       (2, N'One Test Three Four Five Six Seven Eight Nine Ten'), 
       (3, N'One Two Test Four Five Six Seven Eight Nine Ten'), 
       (4, N'One Two Three Test Five Six Seven Eight Nine Ten'), 
       (5, N'One Two Three Four Test Six Seven Eight Nine Ten'), 
       (6, N'One Two Three Four Five Test Seven Eight Nine Ten'), 
       (7, N'One Two Three Four Five Six Test Eight Nine Ten'), 
       (8, N'One Two Three Four Five Six Seven Test Nine Ten'), 
       (9, N'One Two Three Four Five Six Seven Eight Test Ten'), 
       (10, N'One Two Three Four Five Six Seven Eight Nine Test ')

After setting up a generic full text search for this, I can generate a list of results with either:

SELECT *
FROM ftsTest
WHERE CONTAINS (Keywords, 'test')

SELECT *
FROM CONTAINSTABLE(ftsTest, Keywords, 'test')

The Contains query returns the expected results, however has no priority. The ContainsTable query does return a rank, but has the same value assigned to each.

I understand that the rank takes occurrence (offset of word into document) into account, but I don't know how to access that data. This is probably an RTFM error on my part, but I can't see how to influence the rank to help with the clients requirement.

I've been looking at https://technet.microsoft.com/en-us/library/ms142524(v=sql.105).aspx but I'm still a bit in the dark.

I am working on multiple versions of SQL Server (2008 R2 to 2016) at the moment, but it looks like we will be updating all to 2016 in the future.

Any help gratefully received.

Many thanks.

Possible solution:

FTS has the option of ranking by how close together words are. By adding a control word to the start of the field, I can then calculate distance from that word to give me a ranking.

UPDATE ftsTest  
SET keywords = '_start_ ' + keywords

SELECT *
FROM CONTAINSTABLE(ftsTest, Keywords, 'NEAR((_start_,test))')

This does do what I'm looking for but not going to mark it as an answer yet in case someone has something better. Doesn't seem like the most graceful of solutions and it adds to the size of the full text index.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Matthew Baker
  • 2,637
  • 4
  • 24
  • 49
  • what if you use `charindex` and sort by where you find the word? (if i am understanding you correctly) https://learn.microsoft.com/en-us/sql/t-sql/functions/charindex-transact-sql (appologies if i am missing something obvious in that suggestion) –  Nov 02 '17 at 11:28
  • e.g. http://data.stackexchange.com/stackoverflow/query/751267/charindex-and-sort-to-find-first-instance-of-word –  Nov 02 '17 at 11:35
  • Logically that makes sense, and was what the client thought we could do. However there are several million rows of data which makes efficiency a key. We use FTS because of the speed advantage over options such as CharIndex, PatIndex or Like. – Matthew Baker Nov 02 '17 at 11:37
  • I thought that I would be missing something :), thanks –  Nov 02 '17 at 11:39
  • No problem. Its a sensible answer on smaller data sets, and will probably help someone else anyway. Many thanks. – Matthew Baker Nov 02 '17 at 11:44
  • 1
    just thinking about this again.. (i dont have the data sets etc. to test) but i wonder if a table that was three columns (item, word, position/priority) would be faster. where each your query would be looking for the lowest number there (this better than other options that are affected by the length of the words in between as much as the number of words in between) - trial migrating to that format and testing may be pretty trivial too? –  Nov 02 '17 at 15:37
  • 1
    Hi Gordatron. The issue here is that I'll be searching for a different word each time. The application here is searching through a keywords field on a catalogue. There could be any number of keywords, and I could be searching for any of them. Holding a separate table just for search with each keyword stored on a separate line is what the FTS does only it has the best available optimisation, and pre-written interfaces to get to that data. You're definitely on the right lines - that's how Microsoft did it! – Matthew Baker Nov 02 '17 at 15:46

0 Answers0