0

I've got a query I'm trying to run that runs a long time (~ 1 hour) and doesn't produce the right results. I'd like some recommendations on how to optimize this (we're going to need to run it several times a day), as well as some ideas on what could be going wrong. My current code is listed below.

I'm trying to get a full list of days from @StartDate to the current day, the amount of data migrated on each day, and the number of items migrated on the day. The day should be displayed even if there was nothing migrated on that day. The @temp table works and I can SELECT from it and get the right results. When I add the SELECT as written, though, it runs for over an hour and doesn't return the desired results.

Declare @StartDate datetime = '2015-10-01'
Declare @EndDate datetime = CAST(SYSDATETIME() as date)
declare @temp Table
  (
  DayDate datetime
  );

WHILE @StartDate <= @EndDate
begin
  INSERT INTO @temp (DayDate) VALUES (@StartDate);
  SET @StartDate = Dateadd(Day,1, @StartDate);
end ;

SELECT TOP 10 
  CAST(t.DayDate AS date) [Date], 
  SUM(cr.MESSAGE_SIZE) [AmtMigrated],
  COUNT(cr.MESSAGE_SIZE) [ItemsMigrated]
FROM CROSS_REFERENCE cr
RIGHT JOIN @temp t
  ON t.DayDate = cr.MIGRATION_DATE_TIME

GROUP BY CAST(t.DayDate AS date)
ORDER BY CAST(t.DayDate AS date) DESC
TheDoc
  • 688
  • 2
  • 14
  • 33
  • tostart user temp table with #etmp!! do you have index on the `cr.MIGRATION_DATE_TIME` column – Hiten004 Feb 22 '16 at 18:08
  • 1
    The problem is most likely because your MIGRATION_DATE_TIME has a time component while your DayDate does not. That's just a guess since you don't explain what you mean by "doesn't work". – Rabbit Feb 22 '16 at 18:11
  • What is wrong with the results that it returns? For performance, people would need to know the structure of your table, # rows, size of rows, indexes, hardware, query plan... – Tom H Feb 22 '16 at 18:13
  • Have you looked at the data in your temp table, and looked at the data in your Cross_Reference table, and seen whether it makes sense for them to join the way you are doing? – Tab Alleman Feb 22 '16 at 18:14
  • Why is @startdate a `datetime`? In fact you even cast it as a date in your GROUP BY and ORDER BY. Your results are probably wrong because you have some random time components (like 00:00:01 or something) on your `DayDate` and you looking for matching records at the same date AND TIME from your `Cross_Reference` table. You don't care about the time component here, so just make the `@STARTDATE` a date so it's a `date` in your temp table, then cast your `cr_Migration_Date_Time` as date in your `ON` clause. It still might not be terribly speedy since the cast, but your results will be right. – JNevill Feb 22 '16 at 18:19
  • Great points all around, guys. Thanks. I was able to make it work with several of your suggestions. When I get time I'll post my functioning query. – TheDoc Feb 22 '16 at 21:20

2 Answers2

1

Change your join clause to:

ON cr.MIGRATION_DATE_TIME >= t.DayDate AND cr.MIGRATION_DATE_TIME < (t.DayDate+1)

The idea is to allow any indexing on the larger table to still be used, but to make sure that you're accounting for the range of timestamps in that day.

Change @temp to a regular temp table and make DayDate the primary key (so it's indexed).

Add an index to CROSS_REFERENCE.MIGRATION_DATE_TIME if one doesn't already exist. You may also want to look into COLUMNSTORE INDEXes for CROSS_REFERENCE to help aggregate that data more quickly.

Dan Field
  • 20,885
  • 5
  • 55
  • 71
  • 1
    If performance is a concern then this is not the approach to take. I believe that converting the column to a `DATE` will ruin any chance of using an index on it. Better to do a range check by calculating the start and end of the day (basically midnight morning of and midnight of that evening) for the much smaller table's date column and comparing that way. – Tom H Feb 22 '16 at 18:26
  • instead of a variable table why dont you create a #temp table and stored all those values in there adding the fact that table variables can cause you to get less efficient execution plans [Temp tables](https://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/) – jthalliens Feb 22 '16 at 18:42
  • One of my suggestions is to change the table variable to a temp table. – Dan Field Feb 22 '16 at 18:51
0

what about this way, if you are going to do calculations only, you can create a materialized view, something like :

CREATE VIEW dbo.MigrationIndexedView
WITH SCHEMABINDING
AS
SELECT TOP 10 
CAST(cr.MIGRATION_DATE_TIME AS date) [Date], 
 SUM(cr.MESSAGE_SIZE) [AmtMigrated],
 COUNT(cr.MESSAGE_SIZE) [ItemsMigrated]
FROM CROSS_REFERENCE cr

GROUP BY CAST(cr.MIGRATION_DATE_TIME AS date)
ORDER BY CAST(cr.MIGRATION_DATE_TIME AS date) DESC
GO
SELECT * FROM MigrationIndexedView WHERE [Date] BETWEEN @StarDate AND @EndDate

Excuse my typos or syntax, but I think this is a good start depending on the SQL edition you are using, please take a look here for more info

jthalliens
  • 504
  • 4
  • 14