4

I understand that in general it is useful to declare the current date as a variable:

DECLARE @CurrentDate as DateTime
SET @CurrentDate = Getdate()

My understanding is that this samples the current date/time once and records it as a static variable. This is useful to ensure that the "current time" is invariant for the duration of the query and also avoids repeatedly using Getdate() which is more costly than fetching the value of a variable.

My question is.... is this required in the case of a recursive WITH clause which appears to fetch the date on a single occasion?

For example, consider the following code:

WITH CalendarSequence as(
  SELECT Getdate() as RollingDate
  UNION ALL
  SELECT DateAdd(month, -1, RollingDate) as RollingDate
  FROM CalendarSequence
  WHERE DateAdd(month, -1, RollingDate) > Convert(date, '2016-01-01')
)
SELECT 
Year(CalendarSequence.RollingDate)*100+Month(CalendarSequence.RollingDate) as MissingYearMonth
FROM CalendarSequence
LEFT OUTER JOIN TableName
ON Year(CalendarSequence.RollingDate)*100+Month(CalendarSequence.RollingDate) = TableName.YearMonthField
WHERE TableName.YearMonthField is NULL

This query generates a temporary table of dates and compares it against an arbitrary table to highlight individual months with no activity/data.

For the purposes of this specific piece of code it doesn't matter too much if Getdate() is being used multiple times (unless the month changes during query execution!) but this is not universally true for similar queries. Part of my motivation for asking the question is to get a better sense of what the WITH function is doing and if it has any surprising behavior.

P. Hopkinson
  • 193
  • 9
  • 1
    No, but you should probably use a Calendar table instead. Afully indexed table for the next eg 50 years with fields for year, month, day, Iso and non-ISO week numbers, lables, various date representations etc, instead of rebuilding it every time won't take a lot of space. Joining with it though will be a *lot* faster than using generated date ranges – Panagiotis Kanavos Aug 03 '18 at 12:42
  • This is sort of an answer, however, I don't think it's fully addressing your question, so I'm commenting. Generally, you don't need to assign `GETDATE` to a variable. If you do, it would most likely be to use the same value for `GETDATE` throughout your batch. For example, if you have multiple statements in a single batch, the 2 statements might have a different value for `GETDATE`. If they need to be the same, you would assign the value first to a variable, and then reference the variable. If they don't need to be (or it doesn't matter), then you could simply use `GETDATE`. – Thom A Aug 03 '18 at 12:43
  • Within a query, `GETDATE()` always has only one value for all rows, so it doesn't matter how many rows your `WITH` processes, or if the time changes while it's doing so. (For a more convincing example, try `SELECT SYSUTCDATETIME() FROM hugetable`.) This is generally true for T-SQL functions, even non-deterministic ones like `RAND()`, with notable exceptions being `NEWSEQUENTIALID()` and `NEWID()`, which do take on per-row values (as well as functions obviously designed to change per row, like `ROW_NUMBER()` and `NEXT VALUE FOR`). All that said... it doesn't seem to be documented. – Jeroen Mostert Aug 03 '18 at 12:46
  • 1
    Although GetDate() appears fixed due to optimisation, and as someone said newid() is not fixed, I did see a query here where newid() appeared fixed - so if you want to guarantee getdate() is fixed, use a variable, surely it can only ever be 'just as good or better', never worse – Cato Aug 03 '18 at 12:50
  • These are all really helpful comments, thanks! – P. Hopkinson Aug 03 '18 at 13:16
  • 1
    Ah, here's some [more](https://stackoverflow.com/q/6036223/4137916) on this question that should really convince you to use a variable: multiple independent `GETDATE()`s are *not* guaranteed to have the same value within the same query, and the optimizer doesn't unify those values. I'm pretty sure you still won't be able to show a difference with only a single `GETDATE()`, as I can't think of a way to make the optimizer produce a plan where it's evaluated as if it was in different columns, but it seems very unwise to rely on it. – Jeroen Mostert Aug 03 '18 at 13:29

2 Answers2

3

Executive Summary: If you want a single value for a date/time, capture it in a variable and use it as needed. It makes your intent clear and avoids issues that may arise with software updates and other subtleties.

The following code, tested with SQL Server 2008 R2, demonstrates that not all select statements are equal. While GetDate() appears to be a runtime constant for each instance (and possibly all instances), the issue is more subtle than one might think.

GetDate() may be constant across all columns and rows.

-- Constant across all columns and rows.
with Murphy as (
  select GetDate() as A, GetDate() as B, 1 as Rows
  union all
  select GetDate(), GetDate(), Rows + 1
    from Murphy
    where A = B and Rows < 1000000 )
  select Min( A ) as MinA, Max( A ) as MaxA, Min( B ) as MinB, Max( B ) as MaxB
    from Murphy
    option ( MaxRecursion 0 );

A more efficient version uses cross join instead of recursion.

-- Constant across all columns and rows.
declare @Limit as Int = 100000;
with Ten ( Number ) as
  ( select * from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Number ) ),
  TenUp2 ( Number ) as ( select 42 from Ten as L cross join Ten as R ),
  TenUp4 ( Number ) as ( select 42 from TenUp2 as L cross join TenUp2 as R ),
  TenUp8 ( Number ) as ( select 42 from TenUp4 as L cross join TenUp4 as R ),
  Numbers ( Number, A, B ) as ( select top (@Limit) Row_Number() over ( order by ( select NULL ) ),
    GetDate(), GetDate() from TenUp8 )
  select Min( A ) as MinA, Max( A ) as MaxA, Min( B ) as MinB, Max( B ) as MaxB
    from Numbers;

Then again, maybe different instances can drift apart.

-- Fails (and does not generate an execution plan).
declare @A as DateTime = GetDate();
declare @B as DateTime = @A;
declare @Trials as Int = 0;

while @A = @B
  begin
  select @A = GetDate(), @B = GetDate(), @Trials += 1;
  if @Trials % 1000 = 0
    print @Trials;
  end

select @A as A, @B as B, @Trials as Trials;

So you think that's just a set posing as a select and a real select that generates an execution plan would work differently.

-- Fails.
declare @A as DateTime = GetDate();
declare @B as DateTime = @A;
declare @Trials as Int = 0;

while @A = @B
  begin
  select @A = GetDate(), @B = GetDate(), @Trials += 1
    from ( values ( 42 ) ) as PH( A );
  if @Trials % 1000 = 0
    print @Trials;
  end

select @A as A, @B as B, @Trials as Trials;

What if the values come from a table value constructor?

-- Fails.
declare @A as DateTime = GetDate();
declare @B as DateTime = @A;
declare @Trials as Int = 0;

select @A = GetDate(), @B = @A;
while @A = @B
  begin
  select @A = A, @B = B, @Trials += 1
    from ( values ( GetDate(), GetDate() ) ) as PH( A, B )
  if @Trials % 1000 = 0
    print @Trials;
  end

select @A as A, @B as B, @Trials as Trials;

Well, all of the failures come from select statements that assign values to variables. Let's eliminate that by inserting rows into a table and checking them later. (Note: This example doesn't fail when run on SQL Fiddle with SQL Server 2017.)

-- Fails on SQL Server 2008 R2, but not on SQL Server 2017.
declare @Samples as Table ( A DateTime, B DateTime );
declare @Trials as Int = 0;
while @Trials < 100000
  begin
  insert into @Samples ( A, B ) values ( GetDate(), GetDate() )
  set @Trials += 1;
  end

select A, B
  from @Samples
  where A != B;
select Min( A ) as MinA, Max( A ) as MaxA, Min( B ) as MinB, Max( B ) as MaxB
  from @Samples;
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
HABO
  • 15,314
  • 5
  • 39
  • 57
  • Your last example fails on my instance of SQL Server 2017 as well (RTM CU7), so if it doesn't on SQL Fiddle that's no more than a curiosity. Long story short: all current versions of SQL Server treat individual `GETDATE()` expressions as constants for the duration of the query, but 1) this isn't officially documented (though admittedly extremely unlikely to change in new versions) and 2) they're *separate* constants for each expression, so it's easy to get surprises. Whether those surprises are critical depends on the query, of course. – Jeroen Mostert Aug 03 '18 at 15:34
  • Of note is that your first query, which is supposedly safe, is not. The execution plan has two `GETDATE()` evaluations, and those can differ. Getting it to fail is just tedious because of the number of rows involved. Rewriting it as a check in a loop and reducing the rows generated from `1000000` to `10` makes it fail in seconds. The chance of each individual *query* failing is low (since the expressions are only evaluated once per query), but executed enough times they can all be made to fail as long as there's more than one `GETDATE()` in there. – Jeroen Mostert Aug 03 '18 at 15:50
  • Given the fact that `GETDATE()` is *almost* constant, you'd have to wonder if MS wouldn't be willing, in a future version, to go all the way and make all `GETDATE()` evaluations resolve to a single value determined at the start of the query, instead of doing this once per `GETDATE()` expression, as currently seems to be the case. This should barely influence existing queries, except for making them not behave differently every once in a few thousand runs, which I can't see as anything but a net plus. – Jeroen Mostert Aug 03 '18 at 15:55
2

I ran both approaches in SQL Server 2016. I did not find any difference. But, as @Cato mentioned in the comments, it is better to declare as a variable and pass the variable, instead of waiting for the optimizer to take care of it.

Approach 1

DECLARE @currentDate DATE = GETDATE()
;WITH CalendarSequence as(
  SELECT @currentDate as RollingDate
  UNION ALL
  SELECT DateAdd(month, -1, RollingDate) as RollingDate
  FROM CalendarSequence
  WHERE DateAdd(month, -1, RollingDate) > Convert(date, '2018-01-01')
)
SELECT * FROM CalendarSequence

Approach 2

;WITH CalendarSequence as(
  SELECT Getdate() as RollingDate
  UNION ALL
  SELECT DateAdd(month, -1, RollingDate) as RollingDate
  FROM CalendarSequence
  WHERE DateAdd(month, -1, RollingDate) > Convert(date, '2018-01-01')
)
SELECT * FROM CalendarSequence

Both of them are having the same execution plan. No changes at all.

SQL SERVER Execution Plan

Community
  • 1
  • 1
Venkataraman R
  • 12,181
  • 2
  • 31
  • 58