0

I'm having difficulties creating an included index on table that contains slightly over 200 million records. The structure of table is as follows:

    [Id] [int] IDENTITY(1,1) NOT NULL,
  [Name] [nvarchar](60) NOT NULL,
 [VatId] [int] NOT NULL,
[UserId] [int] NULL,
..some additional [int] columns

The problem is that when I do a following query:

set statistics time on;

 select top 20 [Id] from tblArticle where UserId = 7 order by Id desc;

set statistics time off;

..then the result is retrieved in ~27ms (there is a non-clustered index on column UserId).

However when I try to select additional columns, e.g.:

set statistics time on;

 select top 20 [Id], [VatId] from tblArticle where UserId = 8 order by Id desc;

set statistics time off;

..then the result is back in ~2,000ms.

Looking at the execution plan: enter image description here ..obviously, the Key Lookup is what takes the most of the time here.

I have tried to create an included index on VatId, such as:

CREATE NONCLUSTERED INDEX [NonClusteredIndex-UserIdIncVatId] ON [dbo].[tblArticle]
(
    [UserId] ASC
)
INCLUDE ([VatId]) 
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
      SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
      ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

..but after several hours of running this query ends up with error

Insufficient memory in pool (default)

(My SQL Server instance runs on 8GB RAM, Core i7)

My question: Are there any other possible tricks to get rid of this Clustered Key Lookup and improve the performance?

Many thanks

EDIT: The column Id has a clustered index.

Calling the set statistics io on; produces the following:

Table 'tblArticle'. 
Scan count 1, 
logical reads 730, 
physical reads 1, 
read-ahead reads 1351, 
lob logical reads 0, 
lob physical reads 0, 
lob read-ahead reads 0.

EDIT 2: Just to make the full picture, execution plan with hints: enter image description here

Ivan Sivak
  • 7,178
  • 3
  • 36
  • 42
  • The problem is not the keylookup for `VatID` since it's only 20 rows, it's the finding the top rows for `Id`, and that's why you'll need that as indexed field, no need to have `VatId` if it's just top 20 – James Z Nov 03 '15 at 16:35
  • @JamesZ `Id` is a clustered index. – Ivan Sivak Nov 03 '15 at 16:40
  • Its looking for clusterd key because your are doing sorting on that key. You can use the "nolock" for good performance. "nolock" has some drawback, like direty reads. – Jigneshk Nov 03 '15 at 16:42
  • But that's just then 20 key lookups, that shouldn't take 2 seconds. What does statistics IO say? Is there something strange happening here... – James Z Nov 03 '15 at 16:42
  • Nolock does not improve performance, it just does dirty reads. If it helps your problem is blocking – James Z Nov 03 '15 at 16:42
  • @JamesZ Thanks. I updated the answer. – Ivan Sivak Nov 03 '15 at 16:48
  • why you compare different querys? one use `userid = 7` the other `userid = 8` – Juan Carlos Oropeza Nov 03 '15 at 16:49
  • @JuanCarlosOropeza The cache. When I run the same query twice the second time statistics shows almost 0ms time as it is in cache. – Ivan Sivak Nov 03 '15 at 16:51
  • 1
    I'd think that the failure to build the index is rather due to ` SORT_IN_TEMPDB = OFF` than anything else Then again, 8Gb ram is kind of low for this kind of volume... although I'm still confused about this happening; I'd expect A LOT of swapping rather than an error. As for the `NOLOCK`: don't, it's awful advice. – deroby Nov 05 '15 at 10:11
  • @deroby Thanks for the hint. I'm gonna try the `SORT_IN_TEMPDB = ON` tonight and let you know the result. – Ivan Sivak Nov 05 '15 at 10:38
  • @deroby Amazing improvement. `SORT_IN_TEMPDB = ON` did the index creation in 45 minutes. Nice. Please feel free to make your comment an answer I'm gonna upvote it. – Ivan Sivak Nov 06 '15 at 07:54
  • 1
    No worries, I'm glad it worked. I also think that it might be the simple side-effect of the machine having a bit more free ram while rerunning. I've had the same situation on my laptop once (8Gb ram, max 3Gb assigned to MSSQL) where once an operation (many many huge updates in loops, don't ask) took about an hour and a half. Couple of days later I had to rerun the thing and it ran in 25 minutes. The main difference being that the first time I had lots of programs open (browsers, Visual Studios, Compare, Excel, Outlook, etc...) while the second time around the machine was virtually 'empty'. – deroby Nov 06 '15 at 09:41
  • @deroby Good point. The first time I tried it I had lots of tasks opened as well and the index creation totally burnt my pc.Had to restart it and DB was in recovery mode. Next time I tried it I kept only sql server running but index creation ended with insufficient memory error anyway (after many hours of runing). This time however, `SORT_IN_TEMPDB = ON` did almost a magic. – Ivan Sivak Nov 06 '15 at 09:57

1 Answers1

1

Try:

WITH cte AS (
    select top 20 [Id] 
    from tblArticle 
    where UserId = 7 
    order by Id desc
)
SELECT t.[Id], t.[VatId]
FROM tblArticle t
JOIN cte 
  ON cte.[Id]= t.[Id]

Also I just came from another question where suggest create a composited index may help because wont need do the look up

oracle Update comparing Varchar

Community
  • 1
  • 1
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • ah, nice. `320ms` now. Execution plans shows `Clustered Index Seek` instead of `Clustered Key Lookup`.. – Ivan Sivak Nov 03 '15 at 16:57
  • Did you try the composite index or just the cte? Since I saw that question Im wondering how will that work. – Juan Carlos Oropeza Nov 03 '15 at 16:58
  • Just the `CTE`. I'm a bit afraid of composite index as creating the regular index with 1 included `VatId` column took hours on my machine and ended up with error. That's however another issue. I will try the composite key. – Ivan Sivak Nov 03 '15 at 17:01