1

I have an INT column in my table. This column stores the date in special format. I'm trying to convert that column to DATE type.

For example, we keep '2016-03-14' as 20160314.

The exception is, for the last of each month, we do not store day. So for '2016-03-31' we store 201603 and I have to consider if the number is less than 999999 or not to find is the number represents a month-end or other days in month.

So far, I have 2 queries to achieve this task:

Query 1:

This is all mathematics formula.

declare @k int = 20160321
--declare @k int = 201603
select 
    IIF(@k < 999999
        , EOMONTH(DATEFROMPARTS(@k /100, @k % 100, 1), 0)
        , DATEFROMPARTS(@k /10000, (@k / 100) % 100, @k % 100)
    )

Query 2: This one is using string manipulation.

declare @k int = 20160321
--declare @k int = 201603

select 
    IIF(@k < 999999
        , EOMONTH(cast(LEFT(@k, 4) + '-' + RIGHT(@k, 2) + '-01'  as date), 0)
        , cast(LEFT(@k, 4) + '-' + RIGHT(LEFT(@k, 6), 2) + '-' + RIGHT(@k, 2) as date )
    ) AS DateColumn

I need to do the conversion formulas in WHERE clause. Something like:

SELECT K, Dt, Name -- and more fields
FROM tbl
WHERE IIF(K < 999999
            , EOMONTH(DATEFROMPARTS(K /100, K % 100, 1), 0)
            , DATEFROMPARTS(K /10000, (K / 100) % 100, K % 100)
        ) < GetDate()

And performance is important

Question: Is there a better way to do this? Possibly a ways that SQL Server can use the clustered index I have on K column.

FLICKER
  • 6,439
  • 4
  • 45
  • 75
  • @hvd He's not. He's using the EOMONTH() function which returns the date of the last day of the month. – Bacon Bits Apr 27 '16 at 21:04
  • @hvd - i think he got that covered by calling the `EOMONTH` function – YS. Apr 27 '16 at 21:05
  • Yes guys. there might be some other ways very similar to what I have done. the beast case would be doing that so SQL Server can uses the clustered index of have on K column. I add this to body of question. – FLICKER Apr 27 '16 at 21:07
  • @hvd No, he's saying in the example of June, he'd want the 30th. I don't think his problem is deciding what to do for 7 months out of the year. – Bacon Bits Apr 27 '16 at 21:11
  • 1
    @hvd is right, OP states 30th March as the final day of the month. Probably just a typo though, we all do it – Rich Benner Apr 27 '16 at 21:12
  • Right, that's a typo. I fix it, I think the question is clear and if a number is not right you can disregards it. – FLICKER Apr 27 '16 at 21:13
  • If this is just a conversion that you will be running once, then really any method you use that gives you the desired results are going to be all about the same. On the other hand, if this is going into a trigger or something that will run repeatedly then I would try to stay away from the string manipulation if possible. If that is the case, I would look at at least adding another column to store the date as an actual `date` type so you can avoid doing this conversion constantly. – gmiley Apr 27 '16 at 21:14
  • 2
    I'd probably start off by splitting the query with 2 queries: `k < 999999` and `k >= 999999` and combine them using `UNION ALL`. To make best use of your clustered index (making it SARG-able) - you might want consider bringing conversion to the other side (eg - converting the `GETDATE()` to `INT` instead) – YS. Apr 27 '16 at 21:16
  • If storing the date in the table directly is not an option, an alternative might be to do the conversion in the opposite direction: convert your date range to integer ranges. That's likely to let the existing index on `K` be used. (In other words, what YS. just said two seconds earlier.) –  Apr 27 '16 at 21:16
  • It is not a one time process/query. It is in a data warehouse and we don't want to change schema. Assume queries will run in daily basis to it should perform well. – FLICKER Apr 27 '16 at 21:17
  • Is having indexed view a viable alternative ? – YS. Apr 27 '16 at 21:18
  • @YS. if by using UNION ALL, I could make server to use index will be very good. there is no indexed view – FLICKER Apr 27 '16 at 21:18
  • 2
    Off-topic, but typos like this `It is in a data whorehouse` make my day. =) – gmiley Apr 27 '16 at 21:18
  • Back on topic, your first query will probably be the most efficient, but you should do an explain plan on your different queries to see how they perform and where the biggest costs are.Also, I am not as familiar with SQL Server as I am with Oracle, but it is usually preferred to use `CASE` statements instead of an `IIF`, if for no other reason than readability and portability. But if that is not a concern then you can stick with the `IIF`. – gmiley Apr 27 '16 at 21:27
  • @gmiley Last time I remember the topic being discussed, `IIF` unrolls internally into a `CASE` expression. They're not just functionally identical; they're actually identical. – Bacon Bits Apr 27 '16 at 21:31
  • Yep, that is true, but like I said, preferred for readability and portability. `IIF` may not exist if this code ever needs to be run in a different DBMS. That is usually not an issue people will run into. I was just making an observation on the use of `CASE` over `IFF`. – gmiley Apr 27 '16 at 21:34
  • I like IIF because of more readability. As I have read, IIF and CASE have the same performance, not sure though. Actualy IIF will convert to CASE when executing. – FLICKER Apr 27 '16 at 21:36
  • You have to love the irony here. *I store this date in a really bad INT format, and then I have to jump through all these hoops because I need it back in an actual DATE, and there are all these kludgy function calls that slow things down and make the code really complicated, **and performance is important**.* If **performance is important**, store DATE values as DATE values and not some other inappropriate type that you have to convert back and forth. – Ken White Apr 28 '16 at 00:26
  • You are absolutely right, but in real word sometimes we can not change the fact. – FLICKER Apr 28 '16 at 00:53

1 Answers1

1

I would expect that Query 1 would perform better, but you'd have to test it to be sure. I have no idea what kind of performance datefromparts() and datetimefromparts() have. They're relatively new, so it wouldn't shock me if they were magically terrible for no good reason. You're comparing performance of string manipulation and type casting vs the performance of arithmetic and type casting. My guess is it's mostly a wash, but that arithmetic is probably faster.

The options that strike me for a solution performance-wise are: a) Add a datetime column to your table. b) Add a computed column to the table. If you make a PERSISTED column, you could even create an index on it. c) Create a view (an indexed view if you can jump through the requirement hoops). d) Create a new table with the datetime field and update it.

Both (a) and (d) duplicate data, so they're not as great of solutions as they first appear.

I always have found computed columns to be a little gross, but they work well enough. If you create a view, you'll have to INNER JOIN it back to use it, but in most systems JOINS are very fast.

I'd probably look at creating a persisted computed column or a view. The best solution, of course, is to not store dates as integers in the first place.


You might try this query:

SELECT CASE 
        WHEN @K < 999999 THEN EOMONTH(TRY_CONVERT(date, CAST(@K * 100 + 1 AS VARCHAR(10))))
        ELSE TRY_CONVERT(date, CAST(@K AS VARCHAR(10)))
    END AS K_Date

The reason this might work is that YYYYMMDD is one of the ISO date formats. I'd try TRY_CONVERT() before CONVERT() because the query engine can decide to evaluate all the CONVERT without looking at the CASE conditional and throw errors.

FLICKER
  • 6,439
  • 4
  • 45
  • 75
Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
  • Thanks @Bacon. So you don't know any other formula/query different that what I have implemented? – FLICKER Apr 27 '16 at 21:34
  • I'll try this and see how it will perform. I also fixed the syntax error in your query – FLICKER Apr 27 '16 at 21:50
  • @FLICKER Actually, I don't think your edit was right. If @K is over 999999, then it's already the right value to try to convert. You don't want to turn 20160405 into 2016040501. However, I was still missing a parentheses. It is too late in the day for me! – Bacon Bits Apr 27 '16 at 21:54
  • still there was a missing parenthesis. fixed now. tested – FLICKER Apr 27 '16 at 21:57