1

I have this table (TableA):

(
    [FieldA] [int] NOT NULL,
    [FieldB] [int] NOT NULL,
    [Value] [float] NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED 
(
    [FieldA] ASC,
    [FieldB] ASC
)

There are few distinct FieldA values, lets say FieldA can be {1,2,3,4,5,6}.

Why does this query causes a full table scan:

SELECT COUNT(*) FROM TableA WHERE FieldB = 1

While this doesn't:

SELECT COUNT(*) FROM TableA WHERE FieldB = 1 where FieldA in (1,2,3,4,5,6)

Can't Sql Server optimize this? If I had TableB where FieldA was a PK and I joined TableB and TableA the query would run similarly to the second query.

ytoledano
  • 3,003
  • 2
  • 24
  • 39
  • The second version doesn't perform a full table scan presumably because there are more values in column A than `1,2,3,4,5,6` - or at least there could be. The first version doesn't allow the optimizer to narrow down to any range because you haven't included the key column, so it has to check every row. If you add a non-clustered index with FieldB as the leading column you should see different performance / behavior. – Aaron Bertrand Mar 24 '12 at 13:09
  • 2
    A good analogy for a two column index is a phone book. This is ordered by `lastname, firstname`. Your first query is analogous to trying and finding everyone called "John" in the book. The ordering by lastname doesn't help. If the second query ends up doing multiple seeks that end up returning all the rows this is no more efficient than a full scan anyway. – Martin Smith Mar 24 '12 at 13:24
  • A compound index can be used whenever you query for the n left-most members of it. So in your case: your compound primary key helps for queries with `FieldA`, or for queries with `FieldA` and `FieldB` - but it **CANNOT** ever be used for queries using `FieldB` alone. By specifying two columns in the compound key **DOES NOT** imply that your searches for each of those columns (separately) will be sped up. – marc_s Mar 24 '12 at 13:40

2 Answers2

1

Apparently, what I was looking for is a skip-scan optimization which is available on Oracle but not on SQL Server. Skip scan can utilize an index if the leading edge column predicate is missing: http://social.msdn.microsoft.com/Forums/eu/transactsql/thread/48de15ad-f8e9-4930-9f40-ca74946bc401

ytoledano
  • 3,003
  • 2
  • 24
  • 39
  • This answer is correct. There is a Microsoft Connect item which can be voted on: https://connect.microsoft.com/SQLServer/feedback/details/695044/implement-index-skip-scan – usr Jul 09 '12 at 18:56
1

The clustered index you've created is based on two columns. If you're doing a lookup on just one of those columns, SQL Server cannot generate a "key" value to use in the lookup process on that index, so it falls back to a table-scan approach.

Even though FieldA has a very small range of values it could contain, the SQL optimizer doesn't look at that range of values to determine whether it could "fudge" a key out of the information you've given it.

If you want to improve the performance of the first query, you will have to create another index on FieldB. If, as you say, there are not many distinct values in FieldA, and you do most of your lookups on a FieldB exclusively, you might want to consider moving your clustered index to be built only on FieldB and generate a unique index over FieldA and FieldB.

Paul Turner
  • 38,949
  • 15
  • 102
  • 166
  • 1
    or simply change the order of the columns in the primary key when creating the table PRIMARY KEY CLUSTERED ( [FieldB] ASC, [FieldA] ASC ) this will allow to keep the 2 column primary key and use the PK indexes if the query uses only fieldB – Kharaone Mar 24 '12 at 13:59