I have two tables which I want to join with a specific logic.
Table_1 ( S_No, ID, Date1, Date2 )
S_No ID Date1 Date2
1 id1 2014-05-01 2014-07-03
2 id1 2015-03-23 2016-06-18
3 id1 2016-06-21 2016-07-29
Table_2 ( S_No_New, ID_New, Date_New )
S_No_New ID_New Date_New
2_1 id1 2014-04-25
2_2 id1 2014-06-14
2_3 id1 2015-01-10
2_4 id1 2015-02-15
2_5 id1 2015-05-17
2_6 id1 2016-04-24
2_7 id1 2016-06-19
2_8 id1 2016-06-25
2_9 id1 2016-07-11
2_10 id1 2016-08-11
2_11 id1 2016-08-16
I want to join above two table in such a way that I get a count of how many rows are there in table_2 before Date1 and between Date1 and Date2 and then when we move to the next row we have to use the data which is not counted yet so far for the same id.
And if we have a date entry in table_2 after the last Date2 entry in table 1 then we need to append a new row with '+1" added to S_No and similar remaining column details.
Overall this problem can be split into two parts :
1) Getting the counts column
2) Adding up the extra rows ( S_No "4" in this example )
Please drop an answer if you know solution to either of the two.
Final output :
S_No ID Date1 Date2 Count_pre Count_Between
1 id1 2014-05-01 2014-07-03 1 1
2 id1 2015-03-23 2016-06-18 2 2
3 id1 2016-06-21 2016-07-29 1 2
4 id1 NULL NULL 2 0
Logic :
S_No 1 :
Count_Pre = Dates before 2014-05-01
Count_between = Dates between 2014-05-01 and 2014-07-03
S_No 2 :
Count_Pre = Dates between 2014-07-03 and 2015-03-23
Count_between = Dates between 2015-03-23 and 2016-06-18
and so on
Intermediate table has to look something like this:
S_No ID Date Date2 S_No_New Date_New
1 id1 2014-05-01 2014-07-03 2_1 2014-04-25
1 id1 2014-05-01 2014-07-03 2_2 2014-06-14
2 id1 2015-03-23 2016-06-18 2_3 2015-01-10
2 id1 2015-03-23 2016-06-18 2_4 2015-02-15
2 id1 2015-03-23 2016-06-18 2_5 2015-05-17
2 id1 2015-03-23 2016-06-18 2_6 2016-04-24
3 id1 2016-06-21 2016-07-29 2_7 2016-06-19
3 id1 2016-06-21 2016-07-29 2_8 2016-06-25
3 id1 2016-06-21 2016-07-29 2_9 2016-07-11
4 id1 NULL NULL 2_10 2016-08-11
4 id1 NULL NULL 2_11 2016-08-16
I was trying out different windowing and analytics function but couldn't get through this problem. Is it possible to do this kind of join in hive ( basic sql ) ?
NOTE : EDIT 2 : I need to implement this in hive and it supports all the builtin functions but not the variables from mysql. It supports aggregate, windowing and analytics funtions.
EDIT : Changed the date format from dd/mm/yyyy to yyyy-mm-dd