29

I have a table that contains, for example, two fields that I want to make unique within the database. For example:

create table Subscriber (
    ID int not null,
    DataSetId int not null,
    Email nvarchar(100) not null,
    ...
)

The ID column is the primary key and both DataSetId and Email are indexed.

What I want to be able to do is prevent the same Email and DataSetId combination appearing in the table or, to put it another way, the Email value must be unique for a given DataSetId.

I tried creating a unique index on the columns

CREATE UNIQUE NONCLUSTERED INDEX IX_Subscriber_Email
ON Subscriber (DataSetId, Email)

but I found that this had quite a significant impact on search times (when searching for an email address for example - there are 1.5 million rows in the table).

Is there a more efficient way of achieving this type of constraint?

Neilski
  • 4,385
  • 5
  • 41
  • 74
  • Are you saying that the search WITHOUT the index is considerably faster than WITH the index? This is news to me, as far as I knew it the indexes were always created to speed up searches, NOT to slow them down. – Germann Arlington Aug 01 '12 at 07:51
  • 1
    No, that's about it but it should not have **any** significant impact on your search times?! How much impact are we talking about? Can you show the execution plan? Did you update your statistics? – Lieven Keersmaekers Aug 01 '12 at 07:52
  • A search on email address with 'simple' indexes on Email and DataSetId took around 1second. By adding the additional compound index, this increased to around 9 seconds. – Neilski Aug 01 '12 at 12:24
  • I've been playing with this all morning and I actually think I have a different problem... The performance drop I was seeing was measured from the UI (I know, I know) an MVC web page using LINQ. It does indeed show a significant performance drop when the index is added. If I take the SQL expression from the LINQ and run that in directly in SQL Server Management Studio, then I actually see a performance increase (with the index). So, sorry to have mislead everyone. What I don't understand though is why the LINQ expression, when run from the web page, actually runs up to 10x slower. – Neilski Aug 01 '12 at 13:15

2 Answers2

51

but I found that this had quite a significant impact on search times (when searching for an email address for example

The index you defined on (DataSetId, Email) cannot be used for searches based on email. If you would create an index with the Email field at the leftmost position, it could be used:

CREATE UNIQUE NONCLUSTERED INDEX IX_Subscriber_Email
   ON Subscriber (Email, DataSetId);

This index would server both as a unique constraint enforcement and as a means to quickly search for an email. This index though cannot be used to quickly search for a specific DataSetId.

The gist of it if is that whenever you define a multikey index, it can be used only for searches in the order of the keys. An index on (A, B, C) can be used to seek values on column A, for searching values on both A and B or to search values on all three columns A, B and C. However it cannot be used to search values on B or on C alone.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 1
    All true but this doesn't explain how adding an index negatively impacts search times *(significantly)* – Lieven Keersmaekers Aug 01 '12 at 08:03
  • @Lieven: I suspect the OP did more than just create a new constraint. Eg. it dropped an existing index on `Email` – Remus Rusanu Aug 01 '12 at 08:08
  • 2
    Very likely but to paraphrase [Elmer Fud](http://en.wikipedia.org/wiki/Elmer_Fudd), OP is *vewy vewy quiet* – Lieven Keersmaekers Aug 01 '12 at 08:15
  • 3
    Now you're just spreading [Fud](http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt) ;) – Remus Rusanu Aug 01 '12 at 08:17
  • OK, I think you have highlighted an area I don't fully understand, perhaps you can help... Currently the DataSetID column and the Email column are both indexed because I need to find/filter records on either a DataSetId or/or an Email address. From what you are saying, I could drop the Email index and replace with ON Subscriber (Email, DataSetId), but then won't I have the same performance problem if I 'just' want to filter by DataSetId? – Neilski Aug 01 '12 at 09:46
  • Both an index on `(Email)` and one on `(Email, DataSetId)` could be used to search an email, but the second one could also enforce the unique constraint. As well, both an index on `(DataSetId)` and one on `(DataSetId, Email)` could be used to search a DataSetId but the second one could enforce the unique constraint. At this moment is better if you provide additional info, like what indexes you created exactly and the execution plan of the problem query. – Remus Rusanu Aug 01 '12 at 10:24
-2

I assume that only way to enter data into that table is through SPs, If that's the case you can implement some logic in your insert and update SPs to find if the values you are going to insert / update is already exists in that table or not.

Something like this

create proc spInsert
(
    @DataSetId int,
    @Email nvarchar(100)
)
as
begin

if exists (select * from tabaleName where DataSetId = @DataSetId and Email = @Email)
    select -1 -- Duplicacy flag
else
begin
    -- insert logic here
    select 1 -- success flag
end

end
GO


create proc spUpdate
(
   @ID int,
   @DataSetId int,
   @Email nvarchar(100)
)
as
begin

if exists 
(select * from tabaleName where DataSetId = @DataSetId and Email = @Email and ID <> @ID)
    select -1 -- Duplicacy flag
else
begin
    -- insert logic here
    select 1 -- success flag
end

end
GO
yogi
  • 19,175
  • 13
  • 62
  • 92
  • -1 for recommending a procedural solution instead of a constraint enforcement. Besides, the procedure would not work under concurrency. – Remus Rusanu Aug 01 '12 at 08:01
  • @RemusRusanu `procedure would not work under concurrency` what do you mean by that ? – yogi Aug 01 '12 at 08:04
  • 1
    multiple transactions can check the condition *concurrently* and conclude that there is no duplicate. All thread proceed to insert, creating duplicates. Is a well know issue with the `check and insert` codding pattern: http://sqlblog.com/blogs/alexander_kuznetsov/archive/2010/01/12/t-sql-tuesday-002-patterns-that-do-not-work-as-expected.aspx – Remus Rusanu Aug 01 '12 at 08:11