1

For the first time, I'm using full-text search. I have SQL Server 2022 Standard installed, and I've set up full-text search. I created a full-text catalog, [fullTextCatalog_CjeniciArtikli], with ACCENT_SENSITIVITY = OFF.

I have a table named CjeniciArtikli with a column, Naziv, which I've created a full-text index for. The language is set to Croatian, and accent_sensitive is set to false.

When I execute the query below, it doesn't return any results, although it should:

SELECT *
FROM CjeniciArtikli
WHERE CONTAINS(Naziv, 'orink AND tinta AND za AND canon');

However, when I use the LIKE operator with a similar query, it returns 64 records, which is correct:

SELECT *
FROM CjeniciArtikli
WHERE Naziv LIKE '%orink%' AND Naziv LIKE '%tinta%' AND Naziv LIKE '%za%' AND Naziv LIKE '%canon%';

If I modify the first query by removing the word 'za', it returns the records as expected (just like when using LIKE), even though some of those records contain the word 'za'. For example, the first result returned is

Orink tinta za Canon, CLI-521GY, siva, OR-CCLI521G with chi

which contains word 'za'.

SELECT *
FROM CjeniciArtikli
WHERE CONTAINS(Naziv, 'orink AND tinta AND canon'); --works

When I tried searching for just the word 'za', it didn't return any results:

SELECT *
FROM CjeniciArtikli
WHERE CONTAINS(Naziv, 'za');

However, when I used LIKE, it returned 5,877 records:

SELECT *
FROM CjeniciArtikli
WHERE Naziv LIKE '% za %';

I initially thought the issue might be due to two-letter words, but when I tested a query with the word 'sa', it worked without any problems.

I haven't been able to find a solution on the internet or through AI language models, including ChatGPT.

If anyone knows what might be causing this issue, I would greatly appreciate any assistance. I've tried several resources, but without success.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Mladen
  • 45
  • 1
  • 7
  • Full text indexes don't use and/or, you should read about the query syntax – nbk Jul 22 '23 at 10:16
  • 1
    how it doesnt use AND/OR when it is in official documentation https://learn.microsoft.com/en-us/sql/t-sql/queries/contains-transact-sql?view=sql-server-ver16 – Mladen Jul 22 '23 at 10:56
  • 1
    Could it be that `za` is in a STOPLIST? See: [Configure and Manage Stopwords and Stoplists for Full-Text Search](https://learn.microsoft.com/en-us/sql/relational-databases/search/configure-and-manage-stopwords-and-stoplists-for-full-text-search). – Olivier Jacot-Descombes Jul 22 '23 at 11:45

1 Answers1

1

You can get some insight into this from running

DECLARE @system_stoplist_id INT = 0;

DECLARE @lcid INT = (SELECT lcid
   FROM   sys.syslanguages
   WHERE  alias = 'Croatian');

SELECT *
FROM   sys.dm_fts_parser ('orink AND tinta AND za AND canon', @lcid, @system_stoplist_id, 0) 

This returns

keyword group_id phrase_id occurrence special_term display_term expansion_type source_term
0x006F00720069006E006B 1 0 1 Exact Match orink 0 orink
0x00740069006E00740061 2 0 1 Exact Match tinta 0 tinta
0x007A0061 3 0 1 Noise Word za 0 za
0x00630061006E006F006E 4 0 1 Exact Match canon 0 canon

So by default "za" is treated as a stop word because it is so common (in Croatian).

If you need this not to be treated as a stop word then you will need to customise the stop words used rather than just accepting the system default list.

TBH I find it somewhat surprising that a CONTAINS query with AND stop_word would cause it to return no results rather than just have that component ignored but I can repro this behaviour too.

I guess it does make sense because for every row it is "Unknown" whether or not "za" exists in the value and TRUE AND UNKNOWN is UNKNOWN - not TRUE

The CONTAINS query does at least print out

Informational: The full-text search condition contained noise word(s).

to the messages tab to somewhat alert you.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Thanks for helping me. That is really weird because the word 'sa' is more common, but it is working. What would be the easiest solution to ignore the word 'za' so my query works? – Mladen Jul 22 '23 at 16:12
  • @Mladen - I guess create your own stop word list rather than using the system default. You can do `SELECT * FROM sys.fulltext_system_stopwords WHERE language_id = 1050` to see what the system default words are for Croatian and see what you want to keep out of that for your custom one – Martin Smith Jul 22 '23 at 16:19
  • Looks like you would need to do `ALTER FULLTEXT INDEX ... SET STOPLIST` to use the new list but I haven't worked with full text search for > 10 years so not something I have particular knowledge on – Martin Smith Jul 22 '23 at 16:25
  • 1
    @Mladen - Looks like there is another option too - if you just want the stop words in the query to be silently ignored https://stackoverflow.com/a/10451332/73226 - but this seems less good because it will then potentially bring back rows that don't contain `za` – Martin Smith Jul 22 '23 at 16:28
  • And actually I just tried the `transform noise words` option and it didn't make any difference for me - even after rebuilding the full text catalog and deleting and recreating the full text index – Martin Smith Jul 22 '23 at 16:37
  • 1
    that solved my problem exactly i wanted to be :)))) – Mladen Jul 22 '23 at 16:38