Using TSQL, CAST() with COLLATE is non-deterministic. How to make it deterministic? What is the work-around?

Question

I have a function that includes:

SELECT @pString = CAST(@pString AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS

This is useful, for example, to remove accents in french; for example:

UPPER(CAST('Éléctricité' AS VARCHAR(255)) COLLATE SQL_Latin1_General_Cp1251_CS_AS)

gives ELECTRICITE.

But using COLLATE makes the function non-deterministic and therefore I cannot use it as a computed persisted value in a column.

Q1. Is there another (quick and easy) way to remove accents like this, with a deterministic function?

Q2. (Bonus Question) The reason I do this computed persisted column is to search. For example the user may enter the customer's last name as either 'Gagne' or 'Gagné' or 'GAGNE' or 'GAGNÉ' and the app will find it using the persisted computed column. Is there a better way to do this?

EDIT: Using SQL Server 2012 and SQL-Azure.

With a question of this level you might be better off asking on http://dba.stackexchange.com — Scott Chamberlain, Oct 02 '13 at 04:26
maybe this link is of use to you, it reads like in the last paragraph he is dealing with what you want http://www.youdidwhatwithtsql.com/comparing-accented-strings-with-tsql/849/ — DrCopyPaste, Oct 02 '13 at 07:54
Re Q2: Consider using full text search and make the full text catalog [accent insensitive](http://markeboyle.wordpress.com/2011/01/21/sql-server-2008-full-text-indexing-accent-sensitivity/). — Keith, Oct 02 '13 at 13:35
Dr: Thanks for the link - great SQL site. Keith: My understanding is that you cannot use full text in SQL Azure - has that changed recently? — Frank Monroe, Oct 03 '13 at 13:45

score 1 · Answer 1 · answered Oct 03 '13 at 00:22

1

You will find that it is in fact deterministic, it just has different behavior depending on the character you're trying to collate.

Check the page for Windows 1251 encoding for behavior on accepted characters, and unacceptable characters.

Here is a collation chart for Cyrillic_General_CI_AI. This is codepage 1251 Case Insensitive and Accent Insensitive. This will show you the mappings for all acceptable characters within this collation.

As for the search question, as Keith said, I would investigate putting a full text index on the column you are going to be searching on.

answered Oct 03 '13 at 00:22

Codeman

12,157
10
53
91

1

It is SQL that complains that the function is non-deterministic. I don't have a copy of the exact message (I fixed the issue by now) but it throws an error. So I can't argue with it that it is not deterministic. Thanks. – Frank Monroe Oct 03 '13 at 13:44
1

How did you fix the issue, then? – Codeman Oct 03 '13 at 20:49

score 1 · Answer 2 · answered Oct 04 '13 at 04:15

The best answer I got was from Sebastian Sajaroff. I used his example to fix the issue. He suggested a VIEW with a UNIQUE INDEX. This gives a good idea of the solution:

create table Test(Id int primary key, Name varchar(20))
create view TestCIAI with schemabinding as
select ID, Name collate SQL_Latin1_General_CP1_CI_AI as NameCIAI from Test 
create unique clustered index ix_Unique on TestCIAI (Id)
create unique nonclustered index ix_DistinctNames on TestCIAI (NameCIAI)
insert into Test values (1, 'Sébastien')
--Insertion 2 will fail because of the unique nonclustered indexed on the view 
--(which is case-insensitive, accent-insensitive)
insert into Test values (2, 'Sebastien')

Using TSQL, CAST() with COLLATE is non-deterministic. How to make it deterministic? What is the work-around?

2 Answers2