0

I have two tables in in HDFS that I want to join using Impala. One is Employee_Logs the other is HR_Data.

Queries:

select e.employee_id, e.action from Employee_Logs e where e.employment_status_desc = 'Active'
select h.employee_id, h.name from HR_Data h

Employee_Logs:

employee_id  action
2325255b     login     
51666164     login
51666164v    login
r1211        logoff
r18552421    login

HR_Data:

employee_id  name
2325255      Rob    
51666164     Tom
r1211        Tammy
r18552421    Ron

I want to join them so that the data looks like this:

employee_id  action  name
2325255b     login   Rob  
51666164     login   Tom
51666164v    login   Tom
r1211        logoff  Tammy
r18552421    login   Ron

I could do an easy join if the employee_id field matched up on both tables, but the same user can have a "b" or a "v" after their employee id to specify if the account is elevated like an admin account. Some user accounts have an "r" in front of the id but that is the case in both tables.

Is there a way where I can do some where actions and create a new field in the Employee_Logs table like strip the "v" and "b" off of the end of the employee id and then join or is there a better way?

sectechguy
  • 2,037
  • 4
  • 28
  • 61

3 Answers3

2

Probably the safest method is multiple left joins:

select el.*,
       coalesce(h.name, hv.name, hb.name) as name
from employee_logs el left join
     hr_data h
     on el.employee_id = h.employee_id left join
     hr_data hv
     on el.employee_id = concat(h.employee_id, 'v') left join
     hr_data hb
     on el.employee_id = concat(h.employee_id, 'b');
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2
   Select employee_id,action,h1.name from Employee_Logs 
   where RTRIM(employee_id,'b','v'),name IN (Select employee_id,name 
   from HR_DATA as h1);

You can make use of subquery as above, as you have majority of the records needed in the Employee_logs itself and take the reference of common ids to get the name for each record. Or Left join is the best to use in such situations as well meaning will give the data which is common to both the tables keeping the left tables data as majority

Himanshu
  • 3,830
  • 2
  • 10
  • 29
  • I tried finding trim function in impala it showed somewhere idk exactly i found trim() though can check out if that above query works now. or you can use regexp to replace the employee_ids with characters with '' – Himanshu Nov 26 '18 at 17:53
1

Use regexp_replace in the join condition, where you replace b or v at the end of the string with an empty string to match employee id.

select el.employee_id,el.action,hr.name
from employee_logs el
join hr_data hr on hr.employee_id = regexp_replace(el.employee_id,'[bv]$','')
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58