1

In the table below, how to insert rows with the first and last date of years between the START_DATE and END_DATE column?

EMPID EMPNAME START_DATE END_DATE
1001 Shivansh 2015-09-01 2018-03-31
1004 Mayank 2019-04-01 2020-06-30

The output should look as follows:

EMPID EMPNAME START_DATE END_DATE
1001 Shivansh 2015-09-01 2015-12-31
1001 Shivansh 2016-01-01 2016-12-31
1001 Shivansh 2017-01-01 2017-12-31
1001 Shivansh 2018-01-01 2018-03-31
1004 Mayank 2019-04-01 2019-12-31
1004 Mayank 2020-01-01 2020-06-30

This has to be implemented using loops as Azure Synapse Analytics doesn't support Recursive common table expressions

Dale K
  • 25,246
  • 15
  • 42
  • 71
Abhishek Dutta
  • 75
  • 4
  • 12
  • Tally table? This [answer](https://stackoverflow.com/a/66465230/1127428) shows something similar. – Dale K Mar 04 '21 at 09:35
  • @DaleK thanks but, I need this for Azure synapse & the dates are to be updated – Abhishek Dutta Mar 04 '21 at 11:40
  • @wBob I know that to accept an answer I have to click the right mark beside the answer. And I have not yet marked any answer because I already got the answer & also shared the same. – Abhishek Dutta Mar 04 '21 at 11:43

2 Answers2

1

This approach uses a numbers table and a number of date functions which are available in Azure Synapse Analytics dedicated SQL pools, including DATEFROMPARTS, DATEDIFF and YEAR.

NB This is not a recursive query. There is a loop used in the creation of the numbers table but this is done only once. Once the numbers table exists it can be used for similar scenarios, eg converting recursive CTEs to set-based approaches compatible with Azure Synapse Analytics.

DATEFROMPARTS is used to construct the first day of the year in the calculated records. I then use DATEADD to add one year, then take away one day, to get the last day of the year. DATEDIFF with year is used to determine the gap in years between the two dates and therefore the number of records that need to be added. I then UNION the original and calculated records for the full result.

IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp
GO

CREATE TABLE #tmp ( 
    empId       INT NOT NULL, 
    empName     VARCHAR(50) NOT NULL,
    start_date  DATE NOT NULL, 
    end_date    DATE NULL
)
GO


-- Setup test data
INSERT INTO #tmp ( empId, empName, start_date, end_date )
SELECT 1001, 'Shivansh', '2015-09-01', '2018-03-31'
UNION ALL
SELECT 1004, 'Mayank', '2019-04-01', '2020-06-30'
GO

;WITH cte AS (
SELECT *,
    DATEFROMPARTS( YEAR(start_date) + n.number, 1, 1 ) newStart
FROM #tmp t
    CROSS JOIN dbo.numbers n
WHERE n.number <= DATEDIFF( year, start_date, end_date ) 
)
SELECT 'o' s, empId, empName, start_date, 
    CASE
        WHEN YEAR(start_date) = YEAR(end_date) THEN end_date
        ELSE DATEFROMPARTS( YEAR(start_date), 12, 13 )
    END end_date
FROM #tmp
UNION ALL
SELECT 'c', empId, empName, 
    newStart AS start_date,
    CASE 
        WHEN YEAR(end_date) = YEAR(newStart) THEN end_date
        ELSE DATEADD( day, -1, DATEADD( year, 1, newStart ) ) 
    END newEnd
FROM cte
ORDER BY empId, start_date

My results:

My results

I've added the o and c to indicate original and calculated rows but you can remove that column if you like. If you do not have a numbers table already then the script I used to create this one is here. This code has been tested on an Azure Synapse Analytics dedicated SQL pool, version Microsoft Azure SQL Data Warehouse - 10.0.15554.0 Dec 10 2020 03:11:10.

wBob
  • 13,710
  • 3
  • 20
  • 37
  • But still, I have some doubts. Would take your advice as I implement it on my task. Thank you for your kind help. – Abhishek Dutta Mar 04 '21 at 18:10
  • Thank you. Happy to help, although please post a new question with sample data and expected results, I'm sure someone will be able to help you. – wBob Mar 04 '21 at 18:14
  • Yeh sure. I Will be doing that for sure – Abhishek Dutta Mar 04 '21 at 18:15
  • The 'dbo.numbers' is not available in Azure Synapse Analytics. Also, in the output above year associated with empid 1004 is not generating all the tuples. I have implemented using a loop only which takes less time complexity also. – Abhishek Dutta Mar 06 '21 at 04:09
  • Hi, you can create the numbers table with the script I have provided in the link. My revised query returns the correct result after I corrected it yesterday - look in the history, just the screenprint is wrong. My approach will run only once - your loop-based approach will execute many times for all records so will get worse for long service, say 20 years the loop will execute 20 times. As we're talking about Azure Synapse Analytics we're usually talking about big data, millions of rows, so I will try this out with some bigger datasets. – wBob Mar 06 '21 at 11:51
  • Thanks! Can you also provide some workaround for dbo.numbers used in the query? It's not working in the Azure synapse. – Abhishek Dutta Mar 07 '21 at 05:19
  • If you have a look through my answer you will see a link to a script. This script contains the definition and SQL to populate the numbers table. – wBob Mar 07 '21 at 12:17
0

I found the workaround using a loop. The steps followed are as follows:

Create a temporary table to hold initial values.

CREATE TABLE #EMP
(
    EMPID VARCHAR(10),
    EMPNAME VARCHAR(10),
    START_DATE DATE,
    END_DATE DATE
);

Insert initial values.

INSERT INTO #EMP 
SELECT '1001', 'Shivansh', '2015-09-01', '2018-03-31';

INSERT INTO #EMP 
SELECT '1004', 'Mayank', '2019-04-01', '2020-06-30';

Create the required table.

CREATE TABLE #NEWEMP
(
    EMPID VARCHAR(10),
    EMPNAME VARCHAR(10),
    START_DATE DATE,
    END_DATE DATE
);

Insert the first year date (i.e. if the START_DATE for a tuple is 2015-09-01 insert 2015-09-01 for START_DATE and 2015-12-31 for the END_DATE).

INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, START_DATE, DATEFROMPARTS(YEAR(START_DATE), 12, 31) FROM #EMP;

Similarly, Insert the last year date.

INSERT INTO #NEWEMP
SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(END_DATE), 1, 1), END_DATE FROM #EMP;

Run a while loop till the maximum value of the difference between START_DATE and END_DATE column.

DECLARE @counter INT = 1;
DECLARE @len INT = (SELECT MAX(DATEDIFF(YEAR, START_DATE, END_DATE)) FROM #EMP);

WHILE @counter < @len
BEGIN
    INSERT INTO #NEWEMP
    SELECT EMPID, EMPNAME, DATEFROMPARTS(YEAR(START_DATE) + @counter, 1, 1), DATEFROMPARTS(YEAR(START_DATE) + @counter, 12, 31) FROM #EMP
    WHERE @counter < DATEDIFF(YEAR, START_DATE, END_DATE);
    SET @counter += 1;
END

Query the output.

SELECT * FROM #NEWEMP ORDER BY START_DATE;

Here is the Query, written in Azure Synapse Analytics. enter image description here The required output is. enter image description here

Abhishek Dutta
  • 75
  • 4
  • 12