2

I can't quite figure out why I am getting unexpected results from the following query/statement. I have included code to replicate the issue (which may not be an issue at all but more a misunderstanding on my part about how contains works).

create table dbo.temp (id int identity, description nvarchar(max))
insert dbo.temp values ('this is a website.') --this record will be returned in the select query
insert dbo.temp values ('a website exists.') --this record will be returned in the select 
insert dbo.temp values ('go to mywebsite.net') --this record will NOT be returned in the select 
insert dbo.temp values ('go to mywebsite.net.') --this record will NOT be returned in the select 


create fulltext catalog temp
create unique index idx_dbo_temp_1 on dbo.temp (id)
create fulltext index on dbo.temp(description)
    key index idx_dbo_temp_1 on temp
    with change_tracking auto


declare @search_client nvarchar(100) = 'website'

select
    *
from
    dbo.temp
where
    contains ((description),@search_client)


drop fulltext index on dbo.temp
drop index idx_dbo_temp_1 on dbo.temp
drop fulltext catalog temp
drop table dbo.temp

The query will return records that have website in the description field but will not return the record which has mywebsite.net in the description field.

Any thoughts?

UPDATE: the @search_client variable will really be a parameter passed in via SSRS so declared the variable to simulate the parameter being passed in.

thomas
  • 2,592
  • 1
  • 30
  • 44
  • you can use the like query instead.. – Shaikh Farooque May 21 '12 at 12:42
  • the like query would work in this example instance. but not really feasible on a table with 3 million records and 5 search fields I don't think. The query would run for hours. – thomas May 21 '12 at 12:44
  • If you want the mywebsite.net also then you have to use the wild card chars like * in your @search_client. – Shaikh Farooque May 21 '12 at 12:47
  • i just tried setting @search_client to `'*website*'` and got the same results. – thomas May 21 '12 at 12:51
  • check out the following answer. – Shaikh Farooque May 21 '12 at 12:52
  • Leading wildcards are not supported in FTS (there is no effective strategy to index them) http://stackoverflow.com/questions/3400/how-do-you-get-leading-wildcard-full-text-searches-to-work-in-sql-server (and to get the trailing wild card to work in a similar fashion to LIKE you would need quotes; `'"website*'"`) – Alex K. May 21 '12 at 12:57
  • Hey Alex. I tried this. `declare @search_client nvarchar(100) = '"website*"'` and I get the same results. Should I be setting the variable to something different? – thomas May 21 '12 at 13:10
  • You cannot use CONTAINS to match `*website` only `website*`, '"website*"' does the latter so would match `website` – Alex K. May 21 '12 at 13:13
  • Thanks Alex. If you want to post an answer I will mark it. – thomas May 21 '12 at 13:27

3 Answers3

2

In Fulltext, all nonalphanumeric characters are removed in the indexes and replaced by blanks. So in your search, since you had "." in the string, you are searching for "website" and "net".

You can fix this in 2 ways.

You need to have a separate table or separate field that has the fulltext data, separated from the original table were you keep the original data.

In the fulltext table you can remove the "." and store "websitenet".

In that case, you need to remove all "." from the search string before you do the query. If you want to query with ".", you need to replace "." with a character string - for example "dot".

So in this case you would store "websitedotnet".

When you search this time, you replace all "." with "dot" in the query.

OK now your case with a new field where store the column to be search by FTS so:

    ID      DESCRIPTION               DESCFTS
    -----------------------------------------------------
    1   this is a website.        this is a websitedot
    2   a website exists.         a website existsdot
    3   go to mywebsite.net       go to mywebsitedotnet
    4   go to mywebsite.net.      go to mywebsitedotnetdot

then your query:

declare @search_client nvarchar(100) = 'website'

set @search_client = replace(@search_client, '.', 'dot')

select * from dbo.temp where contains ((DESCFTS), @search_client)
Luka Milani
  • 1,541
  • 14
  • 21
  • Hey Luka. In my example code, I built the full text index and then immediately ran a query...so it isn't an out of date index issue. In your example of `websitedotnet*` the records in question would still be missed as the website is mywebsite.net. It sounds like what I am trying to do may not be possible with FTS. – thomas May 21 '12 at 13:41
  • No No, your problem is the DOT. In Fulltext, all nonalphanumeric characters are removed in the indexes and replaced by blanks so you have to replace. Check the BOL of SQL server for more info about this – Luka Milani May 21 '12 at 13:43
  • I believe the `my` prefix is the issue here, in addition to the `dot` issue you talked about. – thomas May 21 '12 at 14:10
  • Sorry but NO again, if the problem is the prefix, what is the purpose of FTS to search ? Try my example will work. The BOL (Books On Line) of SQL explain clearly the concept of nonalphanumeric – Luka Milani May 21 '12 at 14:14
  • If I just change my insert statements to insert what you have in the DESCFTS column instead of what I have inserted in my initial question this does not work. After further research, you cannot use a wildcard prefix with FTS. – thomas May 21 '12 at 14:19
  • Also, I don't want to have to search for `mywebsite.net` or `mywebsitedotnet`. I simply wanted to be able to search for `website` and have it return all instances where the letters `website` where found. – thomas May 21 '12 at 14:20
  • That's OK Thomas :) just wanna help you but i am sure this is the right way to use FTS, because one of the problem is to understand that FTS omit many words as punctualization or denied words (common) as "AS" "IS" "FOR", ... I'd used FTS for long time and i hit my face against those problems already :) – Luka Milani May 21 '12 at 14:28
  • hey luka. i am certainly open to this idea but was just not able to make it work with your suggestions. no matter what i do i cannot get the match i want unless i use the word `"mywebsite*"` which will not work for me as all i have as input is `website` and don't know to search for `mywebsite`. – thomas May 21 '12 at 14:46
1

Try the following:

Updated:

select
    *
from
    dbo.temp
where
    contains ((description),'"website*"')
Shaikh Farooque
  • 2,620
  • 1
  • 19
  • 33
  • the updated code still returns the same results. additionally this will be called from an SSRS report and a parameter will be passed in, which is why I have declared the @search_client variable in the code (to simulate the SSRS parameter being passed to the query). – thomas May 21 '12 at 13:03
0

I think the problem you are having is that, unforutnately, it is not possible to search with a leading wildcard in FTS. If you are searching with the CONTAINS phrase, you cannot use leading, only the trailing functionally. Look for a workaround to leading wildcard searches. IT IS THE MY that is creating the problem in the mywebsite.net not showing. This in combination with the DOT issue already mentioned. There have been some workarounds but they seem hacky to me - and also research FREETEXT for performance gain possibility.

user1166147
  • 1,570
  • 2
  • 15
  • 17