-1

I'm using Teradata SQL Assistant for an analysis on date variables. Basically I have individuals that receive a treatment on a given date. These individuals might receive the treatment several times a year across several years but when there is a gap of more than 12 months I effectively treat this as a division. That is, I need to take the minimum date and the maximum date of treatment before the gap of 12 months occurred. If an individual then goes onto receive the treatment again the same process is followed discounting the previous gap of 12 months.

As a sample of the table I am working with:

Individual_ID Treatment Date Treatment Received Time_Between_Treatment(Months)
A B 01/01/2010 ?
A B 01/02/2010 1
A B 01/04/2010 2
A B 01/05/2011 13
A B 01/06/2011 1
A B 01/08/2011 2
A B 01/09/2012 13
A B 01/10/2012 1
B C 01/01/2010 ?
B C 01/02/2010 1
B C 01/04/2010 2
B C 01/05/2011 13
B C 01/06/2011 1
B C 01/08/2011 2
B C 01/09/2012 13
B C 01/10/2012 1

In the above table Individual A and B would have 3 rows in the final output:

Individual_ID Treatment Minimum Date Maximum Date
A B 01/01/2010 01/04/2010
A B 01/06/2011 01/08/2011
A B 01/10/2012 ?
B C 01/01/2010 01/04/2010
B C 01/06/2011 01/08/2011
B C 01/10/2012 ?

My end goal is to take an average of the months between treatments but as I was unable to achieve this I think the above output would be the next best thing as I can start adding the average for each row and then continue my analysis.

I have attempted using various techniques involving Window Functions most notably AVG () OVER (PARTITION BY ... ORDER BY ...). In addition I have also tried using flags to separate the data into below and above 12 months but have not been able to compute the desired output since then.

I have access to SAS if anyone thinks this would be easier to achieve using this software.

Thanks in advance

GMB
  • 216,147
  • 25
  • 84
  • 135

2 Answers2

0

This is a kind of gaps-and-islands problem, where islands are consecutive identical treatments on the same individual with a lesser than 12 month gap.

We can use window functions. The idea is to build groups by comparing the current and previous treatment and their date gap. We can then aggregate by groups.

select individual_id, treatment, 
    min(date_treatment_received) min_date, max(date_treatment_received) max_date
from (  
    select t.*,
        sum(
            case when treatment = lag_treatment 
                and date_treatment_received < lag_date_treatment_received + interval '1' year 
            then 0 else 1 end
        ) over(partition by individual_id order by date_treatment_received) grp
    from (
        select t.*, 
            lag(date_treatment_received) over(partition by individual_id order by date_treatment_received) lag_date_treatment_received,
            lag(treatment) over(partition by individual_id order by date_treatment_received) lag_treatment
        from mytable t
    ) t
) t
group by individual_id, treatment, grp
order by individual_id, treatment, min_date

Demo on DB Fiddlde - this is Postgres but the syntax should work equally well in Teradata.

individual_id treatment min_date max_date
A B 2010-01-01 2010-04-01
A B 2011-05-01 2011-08-01
A B 2012-09-01 2012-10-01
B C 2010-01-01 2010-04-01
B C 2011-05-01 2011-08-01
B C 2012-09-01 2012-10-01
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Awesome! Thank you I will try this as soon as I can and get back to but I think this is exactly what I'm after! – user5553440 Mar 18 '23 at 22:56
-1

Using a CTE to check if the treatment wasn't longer as 2 Month ago

you can do follwing

this is in postgres to show that it workd

If you want actuall for the last entry in a group ?

you need to calculate the difference between min and max date like https://docs.teradata.com/r/Teradata-Database-SQL-Functions-Operators-Expressions-and-Predicates/March-2017/DateTime-and-Interval-Functions-and-Expressions/Arithmetic-Operators-and-ANSI-DateTime-and-Interval-Data-Types/Examples/Example-Calculating-the-Difference-in-Days-Between-DATE-Values and see if the threshold is here 2 months is passed

WITH CTE AS (SELECT
"Individual_ID", "Treatment", "Date_Treatment_Received", "Time_Between_Treatment(Month)"
  , CASE WHEN "Time_Between_Treatment(Month)" = '?' OR
  "Time_Between_Treatment(Month)" :: int > 2 then 1 ELSE 0 END rnk
FROM Table1), CTE2 AS(
SELECT "Individual_ID", "Treatment", "Date_Treatment_Received", "Time_Between_Treatment(Month)",
SUM(rnk) OVER(PARTITION BY "Individual_ID", "Treatment" ORDER BY "Date_Treatment_Received") rnk
FROM CTE)
SELECT "Individual_ID", "Treatment", MIN("Date_Treatment_Received") as "Minimum Date", MAX("Date_Treatment_Received")  aS "Maximum Date"
FROM CTE2
GROUP BY "Individual_ID", "Treatment", rnk
ORDER By "Individual_ID", "Treatment","Minimum Date"
Individual_ID Treatment Minimum Date Maximum Date
A B 2010-01-01 01:00:00 2010-01-04 01:00:00
A B 2011-01-05 01:00:00 2011-01-08 01:00:00
A B 2012-01-09 01:00:00 2012-01-10 01:00:00
B C 2010-01-01 01:00:00 2010-01-04 01:00:00
B C 2011-01-05 01:00:00 2011-01-08 01:00:00
B C 2012-01-09 01:00:00 2012-01-10 01:00:00
SELECT 6

fiddle

nbk
  • 45,398
  • 8
  • 30
  • 47