1

Could anyone let me know the ways to improve the performance of the SQL below?

select A, B from T_XXX
where isnull(DateX, DateY) >= getdate()
  • No index on either DateX or DateY.
  • Currently No info about the proportion of NULL and non-NULL values in DateX.

(Added Comment) Thanks for the replies! If you consider "No Indexes" as prerequisite, is there any room for improvement of the SQL?

yobioo
  • 79
  • 11
  • `No index on either DateX or DateY.` isn't the answer creating them? – artm Feb 23 '17 at 01:55
  • 1
    You could create a computed column that uses `isnull(DateX, DateY)` as it's value. – DavidG Feb 23 '17 at 02:01
  • @artm Thanks! I added the comment. Sorry for the ambiguity. – yobioo Feb 23 '17 at 04:50
  • @DavidG Thanks for your advice! I'll negotiate our DBA after calculating the space for the computed column with the index and additional time required to update the table with them. – yobioo Feb 23 '17 at 04:53

2 Answers2

2

The simplest way to improve the performance is to create an index on ISNULL(DateX, DateY). Having individual indices on DateX and/or DateY probably won't help since the ISNULL function will likely negate the ability to use the index.

If you did have an index on either column you could rewrite the clause as

WHERE DateX >= GETDATE() OR (DateX IS NULL AND DateY >= GETDATE())

which could take advantage of either index.

SQL Server might internally convert ISNULL into an equivalent clause that would use the indices, but you'd have to try it both ways and see if there is a performance improvement to be certain,'

If you consider "No Indexes" as prerequisite, is there any room for improvement of the SQL?

Not directly. SQL is declarative, meaning you give it an idea of what you want and it comes up with what it thinks is the best plan to gather the data. There's nothing obvious that you can do to change performance just by changing the SQL in this case.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Is there a direct way to create index for `ISNULL(DateX, DateY)`? Do you mean a computed column with index? – Eric Feb 23 '17 at 02:38
  • D Stanley, thanks so much for the answer! Though our DBAs won't easily allow to add indexes considering ETL duration and disk spaces, but I'll negotiate with them if there is no other ways. – yobioo Feb 23 '17 at 02:42
  • @yobioo There is definitely a tradeoff between select performance and insert/update performance when you add indices. It may not be something the DBAs are willing to sacrifice unless you can prove that the overall savings in significant. – D Stanley Feb 23 '17 at 04:44
  • @Eric, @D Stanley It seems SQL Server requires a computed column to do this while Oracle/Postgresql allows index on expressions. – yobioo Feb 23 '17 at 05:47
  • @Eric Looks like I was wrong, you do need a computed column - I am surprised that it doesn't work in SQL Server. – D Stanley Feb 23 '17 at 14:23
1

You might be able to improve this via a UNION:

select A, B from T_XXX
where DateX >= getdate()

UNION

select A, B from T_XXX
where DateY >= getdate() and DateX IS NULL

But you'll still likely need indexes to get the most from this.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Actually, without indices this would likely be _two_ table scans, unless the engine is smart enough to bucket the results appropriately as it scans. – D Stanley Feb 23 '17 at 04:48
  • @Joel Thanks for the trick! I didn't even think of it. In my current environment, it took more time than the original one probably because of what D Stanley pointed out and/or the cost of deduplication. – yobioo Feb 23 '17 at 05:09
  • I'll add filtered indexes on each table for non null data. This will kept indexes smaller than not filtered counterpart, and only one read. – Horaciux Feb 23 '17 at 05:27
  • @Horaciux Thanks for your advice! Actually, I had been unfamiliar with filtered index and have just learned about it. It seems to be effective for our situation. – yobioo Feb 23 '17 at 08:08