0

Little weird requirement, but here it goes. We have a CustomerId VarChar(25) column in a table. We need to make it NVarChar(25) to work around issues with type conversions.

CHARINDEX vs LIKE search gives very different performance, why?

But, we don't want to allow non-latin characters to be stored in this column. Is there any way to place such a constraint on column? I'd rather let database handle this check. In general we OK with NVarChar for all of our strings, but some columns like ID's is not a good candidates for this because of possibility of look alike strings from different languages

Example:

CustomerId NVarChar(1) - PK
  • Value 1: BOPOH
  • Value 2: ВОРОН

Those 2 strings different (second one is Cyrillic)

I want to prevent this entry scenario. I want to make sure Value 2 can not be saved into the field.

Community
  • 1
  • 1
katit
  • 17,375
  • 35
  • 128
  • 256
  • The question doesn't make much sense - nvarchar *is* Unicode, there *are* no invalid characters. There is no difference between Latin and other characters. What type conversion issues do you have? As for IDs, *any* text type is a bad choice. As for the linked question, it confuses how LIKE and CHARINDEX work - there are **NO** conversion issues. LIKE can use indexes *ONLY for a prefix scan*. `CHARINDEX` can't though and has to scan the entire table. – Panagiotis Kanavos Sep 28 '15 at 15:25
  • I added example. Coversion issue explained in linked question(ANSWER). IDs with text is OK, this is visible ID's, DB built with Int keys, but ID's can be changed if needed, implemented as unique indexes – katit Sep 28 '15 at 15:29
  • 1
    The linked question is wrong. The reason for the performance difference is because prefix scans are essentially range queries. What you describe isn't a type conversion issue at all - no types are converted, no values changed. What you describe is called [string normalization](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) and is best done [in the application](https://msdn.microsoft.com/en-us/library/ebza6ck1(v=vs.110).aspx) – Panagiotis Kanavos Sep 28 '15 at 15:36
  • See answer to linked question. It was issue with type conversion(not related to CHARINDEX vs LIKE). And it's a well-known issue. Besides a point, we are going to make our fields NVarChars to avoid issues in a future. We CAN make string normalization in app but I prefer to prevent wrong entry at DB level if possible. – katit Sep 28 '15 at 15:44
  • Insert into a VarChar and have an NVarchar as computed or a trigger? – paparazzo Sep 28 '15 at 15:58
  • @Frisbee this will make things much worse - characters that don't match the field's collation will be converted to the wrong codepage or lost (replace by `?`) – Panagiotis Kanavos Sep 28 '15 at 16:13
  • Good thing it was only a comment – paparazzo Sep 28 '15 at 16:16
  • 1
    Triggers probably overklill, but I wonder if CAST() to VarChar and comparison inside constraint will do? If character converted wrong or lost - it won't match and constraint won't let this field in, right? – katit Sep 28 '15 at 16:38
  • How often are inserts occurring ? You can create a trigger with an ASCII check for the characters. – Amir Pelled Sep 28 '15 at 21:07

1 Answers1

0

Just in case it helps somebody. Not sure it's most "elegant" solution but I placed constraint like this on those fields:

ALTER TABLE [dbo].[Carrier]  WITH CHECK ADD  CONSTRAINT [CK_Carrier_CarrierId] CHECK  ((CONVERT([varchar](25),[CarrierId],(0))=[CarrierId]))
GO
katit
  • 17,375
  • 35
  • 128
  • 256