1

I have two table like below in HIVE.

Table A

+-----+---------+-----------+------------------------+------------------------+
| id  | event   | c_name    |              c_date    |          test_time     |
+-----+---------+-----------+------------------------+------------------------+
| 1   | click   | abc       | 2018-07-02 22:36:32.0  | 2018-06-22 22:36:32.0  |
| 2   | click   | abc 123   | 2018-07-01 22:36:32.0  | 2018-06-01 22:36:32.0  |
| 2   | click   | abc       | 2018-07-02 23:46:32.0  | 2018-07-02 23:46:32.0  |
| 3   | done    | abc 345   | 2018-07-22 23:56:32.0  | 2018-07-22 22:36:32.0  |
| 4   | done    | 123 abc   | 2018-08-22 22:36:32.0  | 2018-08-12 22:36:32.0  |
| 1   | click   | abc 123   | 2018-07-01 22:36:32.0  | 2018-07-01 22:36:32.0  |
+-----+---------+-----------+------------------------+------------------------+

Table B

+-----+---------+------------------------+
| id  | event   |          test_time     |
+-----+---------+------------------------+
| 1   | signup  | 2018-07-01 20:36:32.0  |
| 2   | signup  | 2018-07-02 23:36:32.0  |
| 3   | signup  | 2018-08-02 20:36:32.0  |
| 4   | signup  | 2018-09-02 20:36:32.0  |
+-----+---------+------------------------+

From table A I want to find the id, c_name, c_date from record which has occured before test_time in table B for each id based on test_time.

Expected result

+-----+-----------+------------------------+
| id  | c_name    |            c_date      |
+-----+-----------+------------------------+
| 1   | abc       | 2018-07-02 22:36:32.0  |
| 2   | abc 123   | 2018-07-01 22:36:32.0  |
| 3   | abc 345   | 2018-07-22 23:56:32.0  |
| 4   | 123 abc   | 2018-08-22 22:36:32.0  |
+-----+-----------+------------------------+

I have tried like below But not getting correct result`

select a.c_name, a.c_date, b.id from table A a left outer join table B b where a.id = b.id and a.test_time < b.test_time

How can I get the expected result

nmr
  • 605
  • 6
  • 20

1 Answers1

1

Use LEFT JOIN, try the following:

SELECT ta.id, 
       ta.c_name, 
       ta.c_date 
FROM Table_A AS ta 
LEFT JOIN Table_B AS tb 
  ON tb.id = ta.id 
WHERE tb.test_time > ta.test_time 
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
  • is there a way without using `WHERE NOT EXISTS` or `where exists` in the sub query. `hive` is not supporting `WHERE NOT EXISTS` in subqueries – nmr Sep 19 '18 at 19:50
  • @nmr Yes there are. Give me some time. Let me investigate and update – Madhur Bhaiya Sep 19 '18 at 19:51
  • @nmr although as per https://stackoverflow.com/a/28334495/2469308 Hive seems to support in newer versions! – Madhur Bhaiya Sep 19 '18 at 19:52
  • It is the subquery `SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition. ` near `test_time` – nmr Sep 19 '18 at 20:05
  • @nmr check the edited answer now! Removed exists - switched to left outer join. I hope `IS NOT NULL` is there in the hive – Madhur Bhaiya Sep 19 '18 at 20:06
  • I am getting `Both left and right aliases encountered in JOIN 'test_time` – nmr Sep 19 '18 at 20:28
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/180379/discussion-between-madhur-bhaiya-and-nmr). – Madhur Bhaiya Sep 19 '18 at 20:31