2

I have table structure as below

CREATE TABLE [dbo].[AIRQUALITYTS2]
(
    [FeatureID] [nvarchar](20) NOT NULL,
    [ParameterID] [nvarchar](20) NOT NULL,
    [MeasurementDateTime] [datetime2](7) NOT NULL,
    [ParameterValue] [numeric](38, 8) NULL,
    [Remarks] [nvarchar](150) NULL,

    CONSTRAINT [PK_AIRQUALITYTS2] 
        PRIMARY KEY CLUSTERED ([FeatureID] ASC, [ParameterID] ASC, [MeasurementDateTime] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

When I execute this query:

set statistics io on

SELECT 
    COUNT(featureid), featureid 
FROM
    AIRQUALITYTS2 
WHERE
    FeatureID LIKE 'AS%' 
    AND ParameterID = 'AP2' 
    AND YEAR(MeasurementDateTime) = 2015
GROUP BY 
    FeatureID
ORDER BY 
    FeatureID

I see the logical records 101871 and query execution plan is

enter image description here

But when I add non-clustered index on this table as

 CREATE NONCLUSTERED INDEX non_fidpidmdate
     ON [dbo].[AIRQUALITYTS2] ([ParameterID], [FeatureID])
     INCLUDE ([MeasurementDateTime])

When I execute same query I see logical records reads 4636 only and is very fast and query execution plan is

enter image description here

Question 1: when there is less logical records in second query.

Question 2: Why first query is using clustered index scan as displayed in first image though it has clustered index on featureid,ParameterID and MeasurementDateTime while after adding non-cluster index it uses Index Seek (Non-Clustered) second image as displayed in images

Note: I have change where clause to

MeasurementDateTime >= '2004-01-01 00:00:00' 
and MeasurementDateTime <= '2004-12-31 00:00:00' 

to make it sargable but still the results are the same.

user641812
  • 335
  • 5
  • 19

2 Answers2

2

For question 1: since your index is covering (it contains all the data the query wants to retrieve, and needs for querying and ordering), the query can be run entirely against the index (and its data pages) and use a seek, which obviously loads a great deal fewer pages from disk than when scanning the whole table (clustered index scan = table scan) with all its data.

Not sure what you mean with your question #2 ....

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Before adding non-cluster index it's uses clustered index scan while after it uses Index Seek (Non-Clustered) – user641812 Oct 24 '16 at 10:54
  • @user641812: yes - and ? What 's the problem / issue / question with that?? I don't understand *what you're expecting*, and what you're seeing instead - I don't see any "delta" between expectation and reality.... – marc_s Oct 24 '16 at 10:56
  • Thanks for your response. then I think I need to get more clarity on how clustered and non-clustered index works. I was expecting same behavior if in first we have seed then in second we should have same thing. However if you can help me logic for same. And also in first case how we can run query against the index – user641812 Oct 24 '16 at 11:03
  • 2
    Well, the **clustered** index is special in that it physically orders the data in the table (that's why there can only be 1 clustered index per table), and the *leaf nodes* of the clustered index **are the DATA pages** that make up the table - unlike in any other index. And the SQL Server query optimizer is pretty smart about when to use a **seek**, and when that doesn't have any benefit (opting to do a **scan** instead) – marc_s Oct 24 '16 at 11:04
  • Can we run our query in first case against the index. – user641812 Oct 24 '16 at 11:06
  • @user641812: maybe you *can* - but if the SQL Server query optimizer chooses a **scan** over an index seek, chances are it's not going to do any better than the index scan ..... – marc_s Oct 24 '16 at 11:07
  • Does it mean I should add non-clustered index to my table – user641812 Oct 24 '16 at 11:11
  • @user641812: if that nonclustered index improves your performance : ***YES!*** That's what they are there for ! – marc_s Oct 24 '16 at 11:15
  • My understanding why there is more logical reads in cluster is because cluster index contains more pages than non-cluster. AM right? – user641812 Oct 25 '16 at 08:20
1
  1. In your original CREATE TABLE where you create the PRIMARY KEY CLUSTERED, it specifies the columns to cluster on, in the order they are clustered (stored) in.
[FeatureID]
[ParameterID]
[MeasurementDateTime]

If you run a query with a WHERE clause that includes a specific FeatureID then it would be able to seek to that part of the index.

But you haven't done that in the query. You've used WHERE FeatureID LIKE 'AS%' ...

The query engine cannot seek, because that LIKE with a trailing wildcard % means it has to scan across all the FeatureIDs that start with the letters AS and then within each of those nodes in the tree see if there are records that match ParameterID = 'AP2' AND YEAR(MeasurementDateTime) = 2015.

  1. In your Non-Clustered index, you've used a different column order:
[ParameterID]
[FeatureID]

When you run the same query, it can seek because you've specified an exact ParameterID in the WHERE clause.

Ordering is important! SQL Indexes are sortof B-Tree data structures, and you can't physically store them (or traverse them) in different orderings without creating multiple indexes. Creating too many indexes can be too much overhead for the database, so yes create ones that help the majority of your queries, but don't create too many. Mostly this involves knowing what sort of queries are frequently run against your database and tuning accordingly.

Davos
  • 5,066
  • 42
  • 66