2

I have a simple query:

select top 10 * 
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and 
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2

This query is usually executed for 0.5s: execution plan 1

But the same query with uncommented condition is executed for 5s (!!!) and have a very strange execution plan (Revisions and SqlServers are joined although they have no linked columns and the most selective condition "r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2" is executed at the end of query processing: Execution plan 2

ValidationStatus is ordinary int not null column. Columns Revision2UploadLocations.RevisionId, Revisions.DatabaseId, Databases.InstanceId are indexed. Here is description of tables:

CREATE TABLE [SqlServers]
(
    [Id] int identity(1,1) NOT NULL CONSTRAINT PK_SqlServers PRIMARY KEY,
...
)

CREATE TABLE [Databases](
    [Id] int identity(1,1) NOT NULL CONSTRAINT PK_Databases PRIMARY KEY,
    [InstanceId] int NOT NULL,
    [Name] nvarchar(128) NOT NULL,
...
    CONSTRAINT FK_Databases_SqlServers FOREIGN KEY ([InstanceId]) REFERENCES [SqlServers]([Id])
)

CREATE INDEX [IX_Databases_DatabaseId] ON [Databases] ([InstanceId] ASC)

CREATE TABLE [Revisions]
(
    [Id] int identity(1, 1) NOT NULL,
    [DatabaseId] int NOT NULL,
    [BackupStatus] tinyint NOT NULL,
    [ValidationStatus] tinyint NOT NULL,
...
    CONSTRAINT PK_Revisions PRIMARY KEY([Id]),
    CONSTRAINT FK_Revisions_Databases FOREIGN KEY ([DatabaseId]) REFERENCES [Databases]([Id])
)

CREATE INDEX [IX_Revisions_DatabaseId] ON [Revisions] ([DatabaseId] ASC)

CREATE TABLE [Revision2UploadLocations]
(
    [Id] int NOT NULL IDENTITY (1, 1) CONSTRAINT PK_Revision2UploadLocations PRIMARY KEY,
    [Status] int NOT NULL,
    RevisionId int NOT NULL,
    [ChecksumWasSent] bit NOT NULL,
    CONSTRAINT FK_r2l_Revisions FOREIGN KEY ([RevisionId]) REFERENCES [Revisions]([Id])
)

CREATE INDEX [IX_Revision2UploadLocations_RevisionId] ON [Revision2UploadLocations] ([RevisionId] ASC)

How I can improve performance of this query?

EDIT Now I have some more details: Some tables (SqlServers and Databases) have 1-10 records, but Revisions and Revision2UploadLocations) have 500K+ records, so query optimize decide to use full scan instead index search for small tables and take it first. Query Performance Tuning (SQL Server Compact):

A small table is one whose contents fit in one or just a few data pages. Avoid indexing very small tables because it is typically more efficient to do a table scan.

As a temprary solution I tried to use query hint FORCE ORDER: Query Hint (SQL Server Compact) and response time decreased from 5sec to 0.5sec.

But I don't think that it's a good solution.

Yuriy Gavrishov
  • 4,341
  • 2
  • 17
  • 29

2 Answers2

0

I have found in the past if I insert first to a temp table the first part of your query, with the field you want to further filter on ("ValidationStatus"], then query your temp table the performance/speed is much better. So the initial query would be this:

select * 
into #tmp
FROM Revision2UploadLocations r2l
inner join Revisions r on r2l.RevisionId = r.Id
INNER JOIN [Databases] [D] on [R].[DatabaseId] = [D].[Id]
INNER JOIN [SqlServers] [S] on [D].[InstanceId] = [S].[Id]
where --r.ValidationStatus in (2, 3) and 
r2l.[ChecksumWasSent] = 0 AND r2l.Status = 2

then the final select would be:

select * from #tmp 
where ValidationStatus in (2,3)

No need for indexes, and I know its weird how the optimizer doesn't always work but this approach has been useful to me several times in the past.

  • I edited the code since I had just originally copied and pasted from the OPs code given. The "select top 10 *" should have just been select *. But anyways the point is to not filter on ValidationStatus in (2,3) until after its been inserted to the temp table. – Geoffrey Fernandez Oct 07 '15 at 13:45
  • Sorry, but I think that this way is analogue of FORCE ORDER option. We strictly set up the order of table processing. As for me the right way is to give more information to query optimizer and allow him to decide how to process tables. – Yuriy Gavrishov Oct 08 '15 at 11:33
0

The Geoffrey's solution doesn't give you the expected result. The first statement selects 10 rows without garanties that their r.ValidationStatus are 2 or 3. So finaly, you can get less than 10 rows (or even no rows at all). I think you can rewrite you query as this:

SELECT top 10 *     
FROM Revisions r
  INNER JOIN Revision2UploadLocations r2l 
    ON r2l.RevisionId = r.Id
      AND r2l.[ChecksumWasSent] = 0 
      AND r2l.Status = 2
  INNER JOIN [Databases] [D] on [D].[Id] = [R].[DatabaseId]
  INNER JOIN [SqlServers] [S] on [S].[Id] = [D].[InstanceId]
WHERE r.ValidationStatus in (2, 3)

And if r2l.[ChecksumWasSent] datatype is bit (boolean) with :

  • more 0 than 1, you can create an index on RevisionId + Status
  • very much more 1 than 0, you can create and inde RevisionId + ChecksumWasSent + Status
Khonsort
  • 483
  • 4
  • 6
  • I have tried this variant and some other similar but Query Optimizer is enough smart to detect identity of these queries but unfortunately not so smart to build optimal plan. I will try suggested indexes, thank you! – Yuriy Gavrishov Oct 08 '15 at 05:39
  • Have you got an index on `Revision.ValidationStatus` ? it can help too. How many differents status have you ? Which is the distribution of rows between the differents status ? – Khonsort Oct 08 '15 at 07:45
  • I have tried indexes on (RevisionId, Status) and (RevisionId, ChecksumWasSent, Status). It's very strange but QA is still using full scan for Revision and only then Index seek for Revision2UploadLocations. Condition for Revision2UploadLocations is much more selective. ValidationStatus have the following value density: 1 (811), 2(441252), 4 (4). – Yuriy Gavrishov Oct 08 '15 at 11:49
  • How many rows there are in `Revisions`and `Revision2UploadLocations` ? And do you need all columns for all tables (*) ? If no you can prefix the star by the table alias – Khonsort Oct 08 '15 at 12:00
  • If there is a great number of rows in `Revision2UploadLocations` and very less row with *ChecksumWasSent=0* than *ChecksumWasSent=1* and if your SqlServer release/version support it, you can create a filtrered index on `RevisionId + Status` where ChecksumWasSent=0 – Khonsort Oct 08 '15 at 12:07
  • Yes, I need 5-6 columns from both tables in the result output. – Yuriy Gavrishov Oct 08 '15 at 12:54
  • Unfortunately SQL Server Compact Edition does not support filtered indexes as I know... – Yuriy Gavrishov Oct 08 '15 at 12:55
  • Sorry I haven't seen that you had already specified the number of rows for the 2 main tables. What is the distribution for `ChecksumWasSent` and `Status`columns ? – Khonsort Oct 08 '15 at 13:14
  • Thank you for your help! `Revisions` - 442067, `Revision2UploadLocations` - 638482, `Revision2UploadLocations` (`ChecksumWasSent`=1) - 315975, `Status`: 0(2576), 2(632364), 4 (3542) – Yuriy Gavrishov Oct 08 '15 at 13:21
  • If I understand, the query targets about 99% of `Revisions` rows the and 50% of the `Revision2UploadLocations` rows. I thinks it normal that optimizer choose a full scan. But you need 10 rows so it can be interresting to use hint `OPTION (FAST 10)`. Have a look at [Taking A Hint](http://bradsruminations.blogspot.fr/2010/08/taking-hint.html) – Khonsort Oct 08 '15 at 13:48
  • Sorry SQL Server Compact Edition has only one hint - `FORCE ORDER`. `OPTION (FAST 10)` is a feature of full edition of SQL Server. – Yuriy Gavrishov Oct 08 '15 at 14:05