3

I have a requirement to make a column of varchar(500) unique.
Putting unique constraint would not work as it crosses the size limit of unique.
Then, what would be the best strategy to make column unique?

  • 1
    Are you stuck with the database structure or can you alter it? – six8 Oct 03 '11 at 18:24
  • Yes I can alter the structure. –  Oct 03 '11 at 18:25
  • What is the size limitation on unique indexes? – James Johnson Oct 03 '11 at 18:27
  • By 500 I mean size larger than that, like if we put unique index on varchar(1000) field then it would give following warning: "Warning! The maximum key length is 900 bytes. The index 'UQ__tmp__43D2EAF819DFD96B' has maximum length of 1000 bytes. For some combination of large values, the insert/update operation will fail." –  Oct 03 '11 at 18:32
  • 2
    Size limit is 900 bytes for the key of an index. So varchar(900) or nvarchar(450). How many rows do you expect? – gbn Oct 03 '11 at 18:59

2 Answers2

4

Create another field with the HASH of that varchar(500), and put a UNIQUE CONSTRAINT on the hash field: HASHBYTES('md5', MyLongVarcharField)

This will cause poor performance but if you have a varchar(500) where you need to enforce uniqueness I'm assuming performance isn't on the forefront of your considerations anyways.

EDIT:

To clarify, the chance of having two strings output the same 128 bit hash value is 1 in 340,282,366,920,938,000,000,000,000,000,000,000,000. It's unlikely but not categorically impossible that you could have a collision.

If you are still concerned you can use SHA or SHA1 algorithms which are 160 bits.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • Won't hash collisions be a real issue with this solution, assuming there are enough rows of data? It would only be in really rare circumstances. – Matthew Vines Oct 03 '11 at 18:31
  • 1
    Chances of a hash collision with md5 are 1/2^128. It's more likely the OP will be struck by lightning at the same time that a meteor hits him. – JNK Oct 03 '11 at 18:32
  • 2^28 is 268,435,456 which is big, but not that big in certain systems. I agree it's not a deal breaker, but it's something to consider. – Matthew Vines Oct 03 '11 at 18:35
  • @MatthewVines - I mistyped, its 2^128 not 28. :) – JNK Oct 03 '11 at 18:36
  • @JNK can you please provide some info or link on why to create HASH field and how it actually hits the performance? –  Oct 03 '11 at 18:40
  • 1
    @DarshanJoshi - **No.** For performance you will need to test, but it will negatively impact it since you are running a function against a long string for each row. For a reference, this is a very common technique. I'm not going to give you a citation on how a `WHERE` clause works either, but if you don't trust me you can check MSDN. – JNK Oct 03 '11 at 18:43
  • 1
    One could use SHA1 to reduce collisions, both are subject to the birthday paradox anyway http://en.wikipedia.org/wiki/Birthday_paradox – gbn Oct 03 '11 at 19:18
1

You can use a DML trigger on INSERT and UPDATE. This gets around having to use a hash. I chose an AFTER trigger for simplicity, but this can be easily done with an INSTEAD OF trigger as well.

create trigger dml_EnforceUniqueVal
on dbo.UniqueBigColumn
after insert, update
as  
    declare @CountOfRepeats int
    select @CountOfRepeats = COUNT(*)
    from UniqueBigColumn
    where somestring in
    (
        select somestring
        from inserted
    )

    if @CountOfRepeats > 1
        rollback
go