6

I have a table with a name field with this

Test O'neill 123

If I use

SELECT  *
    FROM    table F
    WHERE   CONTAINS ( F.*, '"Test O''neill 123"' )

it works fine but if I use a wildcard * I get no results.

SELECT  *
    FROM    table f
    WHERE   CONTAINS ( F.*, '"Test O''neill 123*"' )

why is this ? I am using a parser for my search terms and this is adding the wildcard *

I checked some sites, about escaping the ' but I haven't found anything referred to this..

Thanks in Advance

Gonzalo.-
  • 12,512
  • 5
  • 50
  • 82
  • You are escaping `'` correctly. I can't reproduce this problem with SQL Server 2014. What version are you using? – Keith Sep 04 '15 at 21:35
  • sql server 14 too.. could be something about collation or languages ? I have no clue abuot this, I am not a dba, just a regualr developer, so I am not sure where to look at.. – Gonzalo.- Sep 08 '15 at 14:08
  • Not sure. In case it helps, I'm using collation SQL_Latin1_General_CP1_CI_AS and language is English. – Keith Sep 08 '15 at 16:16
  • what I've found is that the apostrophe is a word breaker so I cannot use wildcards, only exact matches – Gonzalo.- Sep 08 '15 at 17:44
  • Are you using a custom stoplist? What is the datatype & size of the name field? Is the text stored as `Test O'neill 123` exactly or is this part of a longer string? – Keith Sep 08 '15 at 19:45
  • it is stored exactly like that. There is a stoplist, and in a table related sys.fulltext_stopwords I see there is a stopword 'o', which is the first part of O'Neill.. could that be related ? I don't understand exactly how is this used. Datatype is nvarchar(200) – Gonzalo.- Sep 08 '15 at 19:48
  • The stoplist could be a factor but so far it hasn't made a difference in my tests, even with 'o' as a stopword. Was this database upgraded or imported from an older version of SQL Server? – Keith Sep 08 '15 at 21:09
  • I think I've found that our language is "Neutral" and there the apostrophe is a break word while in English it is not – Gonzalo.- Sep 08 '15 at 21:31
  • @Gonzalo.- you can check your server collation `SELECT SERVERPROPERTY('Collation')` – wiretext Sep 09 '15 at 09:42
  • SQL_Latin1_General_CP1_CI_AS – Gonzalo.- Sep 09 '15 at 14:14

1 Answers1

8

The problem is due to the combination of 1) using the Neutral language 2) plus a stoplist for your full text index 3) plus unexpected behavior when using a wildcard in a search that includes stopwords.

The Neutral language doesn't cover all of the nuances of the English language, so at index-time it considers O'neill to be 2 separate words O and neill. Then your stoplist considers O to be a stopword so this "word" is not added to the index, only neill is.

At search-time, the search engine typically ignores stop words in multi-word phrases. For example, searching for Contains(*, '"we x people"') will match the text ...we the people..., x and the both being stopwords and thus automatically "matching" each other. (I use the term "matching" loosely because the search engine is not matching the stopwords, but rather it knows that people is 1 word away from we.)

So you might expect the wildcard search Contains(*, '"we the people*"') to also find its match, except that it does not when using a stoplist. If it weren't for the stopword the in the search phrase, or if the was not considered a stopword, the search would work fine. I really can't explain this behavior but I suspect it has something to do with the way the word positions are computed. I also suspect it is not the intended behavior.

So back to your case, Contains(*, '"Test O''neill 123"') will find a match but the wildcard search Contains(*, '"Test O''neill 123*"') does not. (You can even simplify the search to Contains(*, '"O''neill*"') and you'll see that it still does not find a match.) The combination of the stopword O with a wildcard runs into the problem I explained in the last paragraph. This is the crux of the problem stated in your question.

Solutions ranging from most-effective to least-effective-but-possibly-more-practical-for-your-case:

1) Change the language on your full text index to English and re-index. This will cause O'neill to be treated as 1 word and thus you'll avoid the weird wildcard behavior that I explained. You can change the language in the full text index properties via SQL Server Management Studio or by dropping and recreating the index as follows:

ALTER FULLTEXT INDEX ON MyTable DROP (Column1) 
GO
ALTER FULLTEXT INDEX ON MyTable ADD (Column1 LANGUAGE [English])
-- repeat for each column in the index

2) If you need to keep using the Neutral language, consider removing O from your stoplist and re-index.

ALTER FULLTEXT STOPLIST MyStoplist DROP 'o' LANGUAGE 'Neutral';

3) Or don't use a stoplist if you don't need one.

ALTER FULLTEXT INDEX ON MyTable SET STOPLIST = OFF

4) If none of the above solutions are practical, consider removing stopwords from the search phrase, or at least the O' prefix in surnames.

Keith
  • 20,636
  • 11
  • 84
  • 125