0

I have 2 external hive tables as follows. I have populated data in them from oracle using sqoop.

create external table transaction_usa
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_usa';

create external table transaction_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
row format delimited
stored as textfile
location '/user/stg/bank_stg/tran_canada';

Now i want to merge above 2 tables data as it is in 1 external hive table with all same fields as in the above 2 tables but with 1 extra column to identify that which data is from which table. The new external table with additional column as source_table. The new external table is as follows.

create external table transaction_usa_canada
(
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int,
source_table string
)
row format delimited
stored as textfile
location '/user/gds/bank_ds/tran_usa_canada';

how can I do it.?

user2998990
  • 970
  • 5
  • 18
  • 36

3 Answers3

1

You do SELECT from each table and perform UNION ALL operation on these results and finally insert the result into your third table.

Below is the final hive query:

INSERT INTO TABLE transaction_usa_canada
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_usa' AS source_table FROM transaction_usa
UNION ALL
SELECT tran_id, acct_id, tran_date, amount, description, branch_code, tran_state, tran_city, speendby, tran_zip, 'transaction_canada' AS source_table FROM transaction_canada;

Hope this help you!!!

Farooque
  • 3,616
  • 2
  • 29
  • 41
0

You can very well do it by manual partitioning as well.

CREATE TABLE transaction_new_table (
tran_id int,
acct_id int,
tran_date string,
amount double,
description string,
branch_code string,
tran_state string,
tran_city string,
speendby string,
tran_zip int
)
PARTITIONED BY (sourcetablename String)

Then run below command,

load data inpath 'hdfspath' into table transaction_new_table   partition(sourcetablename='1')
shankarsh15
  • 1,947
  • 1
  • 11
  • 16
  • hi shankar... thanks... but I only need merging of data from 2 tables with last and new field as source_table. I am using insert into new_table select columns from ext_table1, source_table_value. But its giving me an error – user2998990 May 18 '16 at 12:28
  • This will basically help you in faster retrieval of data . well in that case you also do a Union All. – shankarsh15 May 18 '16 at 12:31
0

You could use the INSERT INTO Clause of Hive

INSERT INTO TABLE table transaction_usa_canada 
SELECT tran_id, acct_id, tran_date, ...'transaction_usa' FROM transaction_usa;

INSERT INTO TABLE table transaction_usa_canada 
SELECT tran_id, acct_id, tran_date, ...'transaction_canada' FROM transaction_canada;
ZeusNet
  • 710
  • 9
  • 25