0

i have created bucketed table called emp_bucket into 4 buckets clustered on salary column. The structure of the table is as below:

hive> describe Consultant_Table_Bucket;
OK
id                      int                                         
age                     int                                         
gender                  string                                      
role                    string                                      
salary                  double                                      
Time taken: 0.069 seconds, Fetched: 5 row(s)

I also have a staging table from where i can insert data into the above bucketed table. Below is the sample data in the staging table:

id      age     Gender   role         salary
-----------------------------------------------------
938     38      F       consultant      55038.0
939     26      F       student 33319.0
941     20      M       student 97229.0
942     48      F       consultant       78209.0
943     22      M       consultant 77841.0

My requirement is to load data into the bucketed table for those employees whose salary is greater than 10,000 and while loading i have to convert "consultant" role to BigData consultant role.

I know how to insert data into my bucketed table using the select command, but need some guidance how can the consultant value in the role column above can be changed to BigData consultant while inserting.

Any help appreciated

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Sunil
  • 553
  • 1
  • 12
  • 30
  • Replace your column selection during insertion with a CASE statement. Saying, if it's a consultant, then insert the other value instead – OneCricketeer Jun 12 '18 at 00:34
  • If you show what `insert` statement you already wrote it will be easier to help you. – Guillaume Jun 12 '18 at 05:05
  • @Guillaume Below is my insert statement ```INSERT TABLE bucketed_user PARTITION (salary) select id, age,gender,role,salary FROM stage_table where salary > 10000``` – Sunil Jun 12 '18 at 06:57

1 Answers1

1

Based on your insert, you just need to work on the role part of your select:

INSERT into TABLE bucketed_user PARTITION (salary)
select
    id
  , age
  , gender
  , if(role='consultant', 'BigData consultant', role) as role
  , salary
FROM
  stage_table
where
  salary > 10000
;
Guillaume
  • 2,325
  • 2
  • 22
  • 40