3

I have data like below. In this I need to find rows where the diff in their months should be >= 6. The logic should be, we need to compare from first row until we got the row where diff in months is 6, then need to follow the same logic from the newly matched row and it goes on.

I am adding below my sample data and expected output.

Fromdate    LectureiD   StudentID   Diff Months Expected
1-Oct-13    1102          55586         null
15-Oct-13   1102          55586          0
15-Oct-13   1102          55586          0
4-Apr-14    1102          55586          6
19-Dec-14   1102          55586          8
27-Dec-14   1102          55586          0
14-Jan-15   1102          55586          0
14-Jan-15   1102          55586          0
29-Sep-15   1102          55586          8
1-Oct-13    1102          55557          null
15-Oct-13   1102          55557          0
15-Oct-13   1102          55557          0
4-Apr-14    1102          55557          6
19-Dec-14   1102          55557          8

Below is the logic I tried by using analytical function in oracle.

select lectureid, 
       studentid,
       floor(months_between(fromdate, 
                            lag(fromdate) over (partition by
                                                   lectureid, 
                                                   studentid 
                                                order by fromdate
                                               )
                           )
            ) monthdiff 
 from above_table;

Since the lag function will have the default offset of 1 it just checks the row only prior to that because of that the logic I mentioned above is not working properly here since that needs to checked against their prior rows dynamically based on the newly matched row. So the o/p am getting here is as below.

The rows highlighted with asterisk is getting wrong because of this since it compares only with its immediate prior row.

Fromdate    LectureiD   StudentID   Diff Month
1-Oct-13    1102           55586        null
15-Oct-13   1102           55586        0
15-Oct-13   1102           55586        0
*4-Apr-14   1102           55586        5*
19-Dec-14   1102           55586        8
27-Dec-14   1102           55586        0
14-Jan-15   1102           55586        0
14-Jan-15   1102           55586        0
29-Sep-15   1102           55586        8
1-Oct-13    1102           55557        null
15-Oct-13   1102           55557        0
15-Oct-13   1102           55557        0
*4-Apr-14   1102           55557        5*
19-Dec-14   1102           55557        8

Any help here would be greatly appreciated!!!

Jorge Campos
  • 22,647
  • 7
  • 56
  • 87
arunb2w
  • 1,196
  • 9
  • 28
  • I have mentioned the correct data set above..I just mentioned the last data set to highlight where am getting wrong..Thx – arunb2w Dec 31 '15 at 12:06
  • Sorry it was not well aligned I believe.The last column is not the student id it the o/p column (diffmonth) – arunb2w Dec 31 '15 at 12:10
  • For that row `*4-Apr-14 1102 55586 5*` student 1102, what would be the result if the first date was also `15-Oct-13` – Jorge Campos Dec 31 '15 at 12:19
  • Why `19-Dec-14 1102 55586` is expected to be 8, when previous row matched `>=6` rule? – Timekiller Dec 31 '15 at 12:25
  • @JorgeCampos - If the first date was also 15-Oct-13 then it should be 5, since their diff in their months will be 5 – arunb2w Dec 31 '15 at 12:31
  • @Timekiller - The logic here is if a row matched >=6 rule then from that row you need to apply the same logic. In our cases the next row of "19-Dec-14 1102 55586" has the diff in month as 8, so we need to show 8 there – arunb2w Dec 31 '15 at 12:33
  • You need to do this using a recursive query. Are you using Oracle 11gR2 ? – Gordon Linoff Dec 31 '15 at 12:39
  • In the top of my mind I would use a join table getting the min date from the partitioned months of LectureiD, StudentID to use as calculation therefore you would have the lesser date to use that it is in fact what you need. – Jorge Campos Dec 31 '15 at 12:45
  • Please see if this would solve your problem. Note that I did it using postgresql since oracle in sqlfiddle is not working, I had to create the months_between function. Let me know if it would serve as a possible answer: http://sqlfiddle.com/#!15/ea0daf/1 – Jorge Campos Dec 31 '15 at 13:13
  • @JorgeCampos - Sure I will check and reply back – arunb2w Jan 01 '16 at 06:04
  • @JorgeCampos - The only issue i found here is , we are using partition by with only month so if there is a same month but with different year we might miss those records. So I think this should solve our issue "extract(month from fromdate) || extract(year from fromdate)" if we add it in the partition by . Please correct me if am wrong. Thanks – arunb2w Jan 02 '16 at 11:06
  • @JorgeCampos - And also, Seems it is not replacing the date from that of the one in the matched row. Instead we just getting the mindate based on the month,lectureid and studentid. So for dates like this "10-APR-13 15-MAY-13 01-OCT-13 15-OCT-13 15-OCT-13 04-APR-14". It is not working properly – arunb2w Jan 02 '16 at 11:20
  • @arunb2w I will take a look into these issues. – Jorge Campos Jan 02 '16 at 11:40

1 Answers1

2

Recursive solution:

with tmp as (
  select fromdate fd, lectureid lid, studentid sid, null mb,
         row_number() over (partition by lectureid, studentid order by fromdate) rn
    from above_table ),
cte (fd, ld, lid, sid, mb, rn) as (
  select fd, fd, lid, sid, mb, rn from tmp where rn=1
  union all
  select tmp.fd, case when floor(months_between(tmp.fd, cte.ld)) >= 6 
                      then tmp.fd else cte.ld end,
         tmp.lid, tmp.sid, floor(months_between(tmp.fd, cte.ld)), tmp.rn
    from tmp join cte on tmp.lid = cte.lid and tmp.sid = cte.sid and tmp.rn = cte.rn+1)
select to_char(fd, 'yyyy-mm-dd') fromdate, lid lecture, sid student, mb 
  from cte order by sid desc, fd

Test data and output:

create table above_table (Fromdate date, LectureiD number(6), StudentID number(6), Diff number(4));
insert into above_table values (date '2013-10-01', 1102, 55586, null);
insert into above_table values (date '2013-10-15', 1102, 55586, 0);
insert into above_table values (date '2013-10-15', 1102, 55586, 0);
insert into above_table values (date '2014-04-04', 1102, 55586, 6);
insert into above_table values (date '2014-12-19', 1102, 55586, 8);
insert into above_table values (date '2014-12-27', 1102, 55586, 0);
insert into above_table values (date '2015-01-14', 1102, 55586, 0);
insert into above_table values (date '2015-01-14', 1102, 55586, 0);
insert into above_table values (date '2015-09-29', 1102, 55586, 8);
insert into above_table values (date '2013-10-01', 1102, 55557, null);
insert into above_table values (date '2013-10-15', 1102, 55557, 0);
insert into above_table values (date '2013-10-15', 1102, 55557, 0);
insert into above_table values (date '2013-10-29', 1102, 55557, 0);
insert into above_table values (date '2014-04-04', 1102, 55557, 6);
insert into above_table values (date '2014-12-19', 1102, 55557, 8);

FROMDATE      LECTURE    STUDENT         MB
---------- ---------- ---------- ----------
2013-10-01       1102      55586            
2013-10-15       1102      55586          0 
2013-10-15       1102      55586          0 
2014-04-04       1102      55586          6 
2014-12-19       1102      55586          8 
2014-12-27       1102      55586          0 
2015-01-14       1102      55586          0 
2015-01-14       1102      55586          0 
2015-09-29       1102      55586          9 
2013-10-01       1102      55557            
2013-10-15       1102      55557          0 
2013-10-15       1102      55557          0 
2013-10-29       1102      55557          0 
2014-04-04       1102      55557          6 
2014-12-19       1102      55557          8

Explanation:

  1. Subquery tmp only enumerates rows separately for each lecture and students:

    select fromdate fd, lectureid lid, studentid sid, null mb, row_number() over (partition by lectureid, studentid order by fromdate) rn

  2. This line is the "anchor" of recursive subquery CTE, where we take two rows with numbered as 1 in first step

    select fd, fd, lid, sid, mb, rn from tmp where rn=1

  3. In this step I am attaching "recursive member" using condition tmp.rn = cte.rn+1 Specially important is part below, here I am checking if there are six months between last remembered date and date from current row:

    case when floor(months_between(tmp.fd, cte.ld)) >= 6 then tmp.fd else cte.ld end

  4. Last select is required part of syntax.


Some useful links:

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ponder Stibbons
  • 14,723
  • 2
  • 21
  • 24
  • Is it possible to do the same using connect by prior. If so can you please guide me. – arunb2w Jan 20 '16 at 04:47
  • @arunb2w - In my opinion we cannot use `connect by` here. Recursive queries gives some more possibilities which I used to calculate temporary column `ld` and in consequence - column `mb` (months between). I do not see how could this be done in `connect by`. `Prior` is not enough. The alternative to recursive query is PLSQL or maybe `MODEL` clause (not sure of this). – Ponder Stibbons Jan 20 '16 at 10:38
  • i have few questions on this topic here. do u mind clarifying those for me. i have created chat room here. http://chat.stackoverflow.com/rooms/101205/room-for-arunb2w-and-ponder-stibbons please join if you have some free time – arunb2w Jan 20 '16 at 16:28
  • can you pls help on this?? – arunb2w Jan 21 '16 at 15:27
  • Please suggest me some links to understand this better – arunb2w Jan 21 '16 at 15:39
  • I added explanation and links. Hope this clarifies things. – Ponder Stibbons Jan 21 '16 at 18:49