3

i have the following SQL Query which runs on SQL Server CE 4

SELECT [Join_ReleaseMinDatePost].[FK_MovieID]
  FROM ( 
          SELECT [FK_MovieID], MIN([DatePost]) AS [ReleaseMinDatePost] 
          FROM [Release] 
          GROUP BY [FK_MovieID] 
        ) [Join_ReleaseMinDatePost]
  INNER JOIN  
        ( 
          SELECT COUNT([ID]) AS [FolderCount], [FK_MovieID] 
          FROM [MovieFolder] 
          GROUP BY [FK_MovieID] 
        )  [Join_MovieFolder]
    ON [Join_MovieFolder].[FK_MovieID] = [Join_ReleaseMinDatePost].[FK_MovieID]

this query takes a long time to execute but if i change the Part

SELECT COUNT([ID]) AS [FolderCount], [FK_MovieID] FROM [MovieFolder] GROUP BY [FK_MovieID]

To

SELECT 1 AS [FolderCount], [FK_MovieID] FROM [MovieFolder]

So the full query becomes

SELECT [Join_ReleaseMinDatePost].[FK_MovieID]
FROM ( SELECT [FK_MovieID], MIN([DatePost]) AS [ReleaseMinDatePost] FROM [Release] GROUP BY [FK_MovieID] ) [Join_ReleaseMinDatePost]
INNER  JOIN  (SELECT 1 AS [FolderCount], [FK_MovieID] FROM [MovieFolder] ) [Join_MovieFolder]
ON [Join_MovieFolder].[FK_MovieID] = [Join_ReleaseMinDatePost].[FK_MovieID]

then the performance becomes very fast.
the problem is that the part that was changed if taken by itself is pretty fast. but for some reason the execution plan of the first query shows that the "actual number of rows" in the index scan is 160,016 while the total number of rows in the table MovieFolder is 2,192. and the "Estimated number of rows" is 2,192.
so i think the problem is in the number of rows but i cant figure out why its all messed up.
any help will be appreciated :) thanks

the schema of the tables is below

CREATE TABLE [Release] (
  [ID] int NOT NULL
, [FD_ForumID] int NOT NULL
, [FK_MovieID] int NULL
, [DatePost] datetime NULL
);
GO
ALTER TABLE [Release] ADD CONSTRAINT [PK__Release__0000000000000052] PRIMARY KEY ([ID]);
GO
CREATE INDEX [IX_Release_DatePost] ON [Release] ([DatePost] ASC);
GO
CREATE INDEX [IX_Release_FD_ForumID] ON [Release] ([FD_ForumID] ASC);
GO
CREATE INDEX [IX_Release_FK_MovieID] ON [Release] ([FK_MovieID] ASC);
GO
CREATE UNIQUE INDEX [UQ__Release__0000000000000057] ON [Release] ([ID] ASC);
GO

CREATE TABLE [MovieFolder] (
  [ID] int NOT NULL  IDENTITY (1,1)
, [Path] nvarchar(500) NOT NULL
, [FK_MovieID] int NULL
, [Seen] bit NULL
);
GO
ALTER TABLE [MovieFolder] ADD CONSTRAINT [PK_MovieFolder] PRIMARY KEY ([ID]);
GO
CREATE INDEX [IX_MovieFolder_FK_MovieID] ON [MovieFolder] ([FK_MovieID] ASC);
GO
CREATE INDEX [IX_MovieFolder_Seen] ON [MovieFolder] ([Seen] ASC);
GO
CREATE UNIQUE INDEX [UQ__MovieFolder__0000000000000019] ON [MovieFolder] ([ID] ASC);
GO
CREATE UNIQUE INDEX [UQ__MovieFolder__0000000000000020] ON [MovieFolder] ([Path] ASC);
GO
Sparky
  • 14,967
  • 2
  • 31
  • 45
Karim
  • 6,113
  • 18
  • 58
  • 83
  • What is the purpose of the query? And why do you need to inner join a list of counts of each movie, when you don't use the count for anything? – Alex Oct 29 '11 at 21:27
  • this is a part of a bigger query, but i isolated the part that was making the slowdown and put it here so its possible for other people to unerstand it :) – Karim Oct 29 '11 at 21:31
  • u can change the SELECT to "SELECT [Join_ReleaseMinDatePost].[FK_MovieID], [ReleaseMinDatePost], [FolderCount]" so it makes more sense :) – Karim Oct 29 '11 at 21:41

3 Answers3

1

I think you're running into a correlated subquery problem. The query part you're experimenting with is part of a JOIN condition, so it is fully evaluated for every potentially matching row. You're making your SQL engine do the second 'GROUP BY' for every row produced by the FROM clause. So it's reading 2192 rows to do the group by for each and every row produced by the FROM clause.

This suggest you're getting 73 rows in the FROM clause grouping (2192 * 73 = 160 016)

When you change it to do SELECT 1, you eliminate the table-scan read for grouping.

DaveE
  • 3,579
  • 28
  • 31
  • thanks for clarification but how can this be fixed? also i think this problem only exists in sql server compact edition and the not hte big one (not sure though) – Karim Oct 29 '11 at 21:35
  • also i dont understand why the "SELECT 1 , [FK_MovieID] FROM [MovieFolder]" Eliminate the table scan? , it should get the FK_MovieID from the table i think. – Karim Oct 29 '11 at 21:37
  • i have read about the correltaed subquery and i dont think this queries are correlated since they dont include anyreference to any item in the main query , also i have read that sql server compact edition dont support correlated subquery – Karim Oct 29 '11 at 21:55
  • CE might not support correlated subqueries directly, but the SQL as you have it written *is* doing table scans to do the secondary GROUP BY. When you change it to a simple SELECT, the table scans don't have to happen to supply the data for the GROUP BY. Believe it or not, but your description matches. – DaveE Oct 30 '11 at 04:33
  • i now understood the problem and i think the solution is to create a normal table that will be used as temporary, unless you can specify an index to be created to be used in the subquery, but i dont think this is possible, is it? – Karim Oct 30 '11 at 12:26
  • I don't think you can create an index that by itself will help this particular query. It looks like a work table is the way to go. – DaveE Oct 31 '11 at 22:45
0

DaveE is right about the issue with your correlated subquery. When these issues arise you often need to rethink your entire query. If anything else fails you can probably save time extracting your sub-query to a temporary table like this:

/* Declare in-memory temp table */
DECLARE @Join_MovieFolder TABLE ( 
     count INT,
     movieId INT )

 /* Insert data into temp table */
 INSERT INTO @Join_MovieFolder ( count, movieId ) 
 SELECT COUNT([ID]) AS [FolderCount], [FK_MovieID] 
          FROM [MovieFolder] 
          GROUP BY [FK_MovieID] 


 /* Inner join the temp table to avoid excessive sub-quering */
SELECT [Join_ReleaseMinDatePost].[FK_MovieID]
  FROM ( 
          SELECT [FK_MovieID], MIN([DatePost]) AS [ReleaseMinDatePost] 
          FROM [Release] 
          GROUP BY [FK_MovieID] 
        ) [Join_ReleaseMinDatePost]
  INNER JOIN @Join_MovieFolder 
    ON @Join_MovieFolder.movieId = [Join_ReleaseMinDatePost].[FK_MovieID]
Alex
  • 2,011
  • 3
  • 21
  • 27
  • there is no temporary table as far as i know in sql server compact edition. – Karim Oct 29 '11 at 21:56
  • Argh... missed the CE part. Nope that's no option for you then. Sorry. I suggest you consider if you can create your query without the sub-query then... – Alex Oct 29 '11 at 22:12
0

i think if found the problem.
but i would like people to tell me if this indeed the problem.
the problem is that the 2 sub queries create some kind of like temp table (dont know how to call it).
but these 2 temp table dont contain a clustered index on [FK_MovieID].
so when the external join tries to join them it need to scan them several times and and this is mainly the problem.
now if i can only fix this ?

Karim
  • 6,113
  • 18
  • 58
  • 83