I've got a very large (100,000,000+ rows), poorly performing table in the following format
TABLE [dbo].[Emails](
[pkid] [int] IDENTITY(1,1) NOT NULL,
[cid] [int] NULL,
[email] [nvarchar](255) NULL,
[lid] [int] NULL,
[date] [smalldatetime] NULL
)
[There's a clustered index on pkid
, a Unique non-clustered index on cid
+email
+lid
, and a non-unique non-clustered index on cid
alone]
For a given cid
, there can be several entries with the same email
, and different lid
values.
cid, email, lid, date
123, mal@serenity.fake, 456, 2014-07-17 12:21:00
123, mal@serenity.fake, 459, 2014-07-17 12:26:00
123, mal@serenity.fake, 466, 2014-07-17 12:27:00
123, zoe@serenity.fake, 456, 2014-07-17 12:21:00
123, zoe@serenity.fake, 467, 2014-07-17 12:28:00
For a given cid
, I want to extract the first 10 unique emails, and concatenate the rest of the data into a list of lids and either a list of dates or a min date field.
I would also like to return the pkid
of the last data row selected, for paging
so for example
mal@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
zoe@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
wash@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
jane@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
kaylee@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
inara@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
book@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
simon@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
river@serenity.fake, [list of lids], [list of dates], [last pkid for all rows]
early@intruder.fake, [list of lids], [list of dates], [last pkid for all rows]
What is the most efficient way of doing this? The way this is currently implemented seems horribly inefficient and I'd like to improve it:
SELECT TOP (@rows) * FROM (
SELECT TOP (@rows) * FROM (
SELECT TOP (@rows * @page)
email,
MIN([date]) as [date],
COUNT(lid) as [lids],
[ids] = STUFF((SELECT ',' + CAST(lid AS VARCHAR(8)) + ';' + CAST(date as VARCHAR(20)) FROM Emails WHERE cid = @cid AND email = Z.email FOR XML PATH('')), 1, 1, '')
FROM Emails Z (NOLOCK)
WHERE cid = @cid GROUP BY email ORDER by MIN([date]) DESC
) as [a] ORDER BY [date]
) [b] ORDER BY [date] DESC
As you can see, if I'm trying to get to page 5 of these results, it's selecting all 5 pages (50 emails, which could be 500+ rows of data), and then the 10 from the bottom. There has to be a better way to do this!