2

I am trying to solve a trending problem at work very similar to the below example. I think I have a method but don't know how to do it in SQL.

The input data is:

MTD         LOC_ID  RAINED
1-Apr-16    1       Y
1-Apr-16    2       N
1-May-16    1       N
1-May-16    2       N
1-Jun-16    1       N
1-Jun-16    2       N
1-Jul-16    1       Y
1-Jul-16    2       N
1-Aug-16    1       N
1-Aug-16    2       Y

The desired output is:

MTD         LOC_ID  RAINED  TRENDS
1-Apr-16    1       Y       New
1-May-16    1       N       No Rain
1-Jun-16    1       N       No Rain
1-Jul-16    1       Y       Carryover
1-Aug-16    1       N       No Rain
1-Apr-16    2       N       No Rain
1-May-16    2       N       No Rain
1-Jun-16    2       N       No Rain
1-Jul-16    2       N       No Rain
1-Aug-16    2       Y       New

I'm trying to produce the output from the input by trending on MTD without depending on it. This way, when new months are added to the input, the output changes without editing the query.

The logic for TRENDS will occur on each unique LOC_ID. Trends will have three values: "New" in the first month RAINED is "Y", "Carryover" in any following months where RAINED is "Y", and "No Rain" in any months where RAINED is "N".

I'd like to automate this problem by introducing an intermediate step with a listagg. For example, for LOC_ID = "1":

MTD         LOC_ID  RAINED  PREV_RAINED
1-Apr-16    1       Y       (null) / 0 / (I don't care)
1-May-16    1       N       Y
1-Jun-16    1       N       Y;N
1-Jul-16    1       Y       Y;N;N
1-Aug-16    1       N       Y;N;N;Y

This way, to produce "TRENDS" in the output, I can say:

case when RAINED = 'Y' then
    case when not regexp_like(PREV_RAINED, 'Y', 'i') then
        'New'
    else
        'Carryover'
    end
else
    'No Rain'
end as TRENDS

My problem is that I'm not sure how to produce PREV_RAINED for each unique LOC_ID. I have a feeling it needs to combine LAG() statements and partition by LOC_ID order by MTD, but the number of lags I need to do depends on each month.

Is there an easy way to produce PREV_RAINED or a simpler way to solve my overall problem while preserving automation each month?

Thanks for reading all of this! :)

Ryan Barker
  • 113
  • 2
  • 11
  • 1
    In the example, is the trend for loc_id=2 correct? it rained in May but not in April, so I expected a New in May. Am I wrong? – Aleksej Sep 28 '16 at 18:12
  • What version of Oracle? In 12c you could use MATCH_RECOGNIZE. –  Sep 28 '16 at 18:20
  • @Aleksej You are correct, but I messed up the input in my example. I have edited it to be correct. – Ryan Barker Sep 28 '16 at 18:43
  • @mathguy I am using version 12c. What is your solution? – Ryan Barker Sep 28 '16 at 18:43
  • For clarity on trends, consider location 1. In April and July it rained, but all other months it did not. Because April is the first time it rained, trends is marked 'New'. In May, June, and August, it did not rain, so trends is marked 'No Rain'. In July, it rained, but it had already rained in April. Therefore, trends is marked 'Carryover' for July. – Ryan Barker Sep 28 '16 at 18:51

3 Answers3

1

In the below SQL there are two parts.

(i) Calculating the ROWNUMBER value for rained attribute at loc_id,rained level.
(ii) Get the count at partition level loc_id,rained.

By computing the above two we can write the CASE WHEN logic to calculate the trends based on your requirement.

SELECT mtd,
       loc_id,
       rained,
       CASE WHEN rained = 'N' THEN 'No Rain'
            WHEN rained = 'Y' AND rn = 1 THEN 'New'
            ELSE 'Carry Over'    
        END AS Trends       
  FROM
        ( 
            SELECT mtd,
                   loc_id,
                   rained,                   
                   ROW_NUMBER() OVER ( PARTITION BY loc_id,rained ORDER BY mtd ) AS rn,
                   COUNT(*) OVER ( PARTITION BY loc_id,rained ) AS count_locid_rained               
              FROM INPUT
              ORDER BY loc_id,mtd,rained,rn
         ) X;
Teja
  • 13,214
  • 36
  • 93
  • 155
  • Give me a moment to give this a shot. I'll let you know how it goes. Thank you! – Ryan Barker Sep 28 '16 at 19:00
  • This is very close to what I need. However, I find that new is not flagging correctly in some cases because COUNT_LOCID_RAINED is incorrect. I will put the details in an answer. When you count at the partition level loc_id rained and put it into input, which count do you return? – Ryan Barker Sep 28 '16 at 19:19
  • Consider a new LOC_ID, 3. The subquery for loc_ID 3 returns: MTD LOC_ID RAINED PREV_RAINED COUNT_LOCID_RAINED 01-APR-16 3 n 3 01-MAY-16 3 n n 3 01-JUN-16 3 y n 2 01-JUL-16 3 n y 3 01-AUG-16 3 y n 2 This messes up the trends field because COUNT_LOCID_RAINED is incorrect. The main query would work if COUNT_LOCID_RAINED was – Ryan Barker Sep 28 '16 at 19:25
  • I understood the mistake.. let me rewrite the sql for computing new. – Teja Sep 28 '16 at 19:26
  • Okay. If you want to see it, I posted an answer with a formatted example of a broken case. Thank you! – Ryan Barker Sep 28 '16 at 19:32
  • This is perfect, thank you very much. Can you clarify how ROW_NUMBER() OVER ( PARTITION BY loc_id,rained ORDER BY mtd ) AS rn, works? How does row number return for the two values of "rained"? – Ryan Barker Sep 28 '16 at 19:45
  • Rownumber is being assigned in a sequential order starting from 1. In this logic I am partitioning by loc_id,rained and I am ordering on mtd in ascending order when assigning row numbers from 1 to n. – Teja Sep 28 '16 at 19:47
  • Please upvote as well if you liked the answer. thanks – Teja Sep 28 '16 at 19:48
  • The method behind the madness just clicked in my head. I also just upvoted. This is brilliant - Thank you! – Ryan Barker Sep 28 '16 at 19:50
1

Here is a solution for older versions. The WITH clause is for input data; the solution starts right after the WITH clause.

I'll work on a MATCH_RECOGNIZE solution next, I may add it to this answer.

with
     input_data ( mtd, loc_id, rained ) as (
       select to_date('1-Apr-16', 'dd-Mon-rr'), 1, 'Y' from dual union all
       select to_date('1-Apr-16', 'dd-Mon-rr'), 2, 'N' from dual union all
       select to_date('1-May-16', 'dd-Mon-rr'), 1, 'N' from dual union all
       select to_date('1-May-16', 'dd-Mon-rr'), 2, 'N' from dual union all
       select to_date('1-Jun-16', 'dd-Mon-rr'), 1, 'N' from dual union all
       select to_date('1-Jun-16', 'dd-Mon-rr'), 2, 'N' from dual union all
       select to_date('1-Jul-16', 'dd-Mon-rr'), 1, 'Y' from dual union all
       select to_date('1-Jul-16', 'dd-Mon-rr'), 2, 'N' from dual union all
       select to_date('1-Aug-16', 'dd-Mon-rr'), 1, 'N' from dual union all
       select to_date('1-Aug-16', 'dd-Mon-rr'), 2, 'Y' from dual
     )
select mtd, loc_id, rained,
       case rained when 'N' then 'No Rain'
                   else case when rn = 1 then 'New' 
                                         else 'Carryover' end
                   end  as trends
from ( select mtd, loc_id, rained, 
              row_number() over (partition by loc_id, rained order by mtd) rn
       from   input_data
)
order by loc_id, mtd
;

Output

MTD                     LOC_ID RAINED TRENDS  
------------------- ---------- ------ ---------
01/04/2016 00:00:00          1      Y New      
01/05/2016 00:00:00          1      N No Rain  
01/06/2016 00:00:00          1      N No Rain  
01/07/2016 00:00:00          1      Y Carryover
01/08/2016 00:00:00          1      N No Rain  
01/04/2016 00:00:00          2      N No Rain  
01/05/2016 00:00:00          2      N No Rain  
01/06/2016 00:00:00          2      N No Rain  
01/07/2016 00:00:00          2      N No Rain  
01/08/2016 00:00:00          2      Y New      

 10 rows selected
  • Thank you! I'd like to avoid doing something similar to your first answer though, because it requires editing the query each time a new month is added. I'd like the query to adjust to new data by itself. – Ryan Barker Sep 28 '16 at 19:33
  • @RyanBarker - ?? What part of the query requires editing when new months are added? I hope you are not talking about the WITH clause; as I said already, the solution DOES NOT INCLUDE IT. Remove it, and use the rest (`select mtd, ...` to the end) on your table (using your actual table and column names); when more rows are added, the query should work with no change whatsoever. –  Sep 28 '16 at 19:39
  • My apologies - I misread what you said. Your solution is great and works beautifully. Thank you for providing it and again, I apologize for my initial misunderstanding. – Ryan Barker Sep 28 '16 at 19:54
1

Solution using MATCH_RECOGNIZE (for Oracle 12c only). Test the different solutions on your dataset; I am told that MATCH_RECOGNIZE may be significantly faster than other solutions, but this depends on many factors.

select loc_id, mtd, rained, trends
from input_data
  match_recognize (
    partition by loc_id, rained
    order by     mtd
    measures     mtd as mtd,
                 case when rained = 'N' then 'No Rain'
                      else case when match_number() = 1 then 'New' else 'Carryover' end
                      end as trends
    pattern (a)
    define a as 0 = 0
  )
order by loc_id, mtd;