1

Apache Hive (version 1.2.1000.2.6.5.0-292)

I have a table, A, that has a large number of columns. I'm trying to select only the columns that I need from A and the columns I want live in a key-value pair table, B. (Example below). I can query B to get the columns I need, but I'm struggling to put the output of this sql query as the columns of the query used in A. Is there a way to do this in one sql query? I can write a python program to do this to create the SQL, but I'd rather have it in just one query for simplicity to the end-user.

DDL for tables

create table A (
a1 string,
a2 string,
a3 string,
b1 string,
b2 string,
b3 string,
)

create table B (
key string,
value string,
)

Data in table B (key value table). It should be noted that the data in the column value cannot be inferred upon by the corresponding value in column key. I have written them as a1,a2 for simplicity.

key,value
a,a1
a,a2
a,a3
b,b1
b,b2
b,b3

Query to get the correct columns = select value from B where key='a'

When you merge the results from this query with the Table A query you should get this sql statement

select a1,a2,a3 from A

As you can see, we are trying to derive the columns used in Table A

My first attempt doesn't work:

select 
(select value from B where key='a')
from A

What's the right way to do this?

Thanks in advance!

Micah Pearce
  • 1,805
  • 3
  • 28
  • 61

1 Answers1

0

You can try to generate query and write to file. Once done, you can call in existing hive hql using source command:

Here is sample queries by taking your example:

Create table and dummy data:

CREATE EXTERNAL TABLE IF NOT EXISTS a_table(
a1 string,
a2 string,
a3 string,
b1 string,
b2 string,
b3 string)
LOCATION '/user/xyz/a_table';

insert into table a_table
 VALUES ('a11', 'a12', 'a13','b11','b12','b13'), ('a21', 'a22', 'a23','b21','b22','b23');


CREATE EXTERNAL TABLE IF NOT EXISTS b_table (
key string,
value string
)
LOCATION '/user/xyz/b_table';

insert into table b_table
 VALUES ('a', 'a1'), ('a','a2'),('a','a3'), ('b', 'b1'), ('b','b2'),('b','b3');

Validate data into table:

select * from a_table;
OK
a11     a12     a13     b11     b12     b13
a21     a22     a23     b21     b22     b23
Time taken: 0.124 seconds, Fetched: 2 row(s)


select * from b_table;
OK
a       a1
a       a2
a       a3
b       b1
b       b2
b       b3
Time taken: 0.15 seconds, Fetched: 6 row(s)

This is hive hql part to generate statement based on given key and then using source to run query:

insert overwrite local directory '/home/xyz/temp_hql/out'
select concat_ws(" ", "select",concat_ws("," , collect_list(value)), "from a_table") 
from b_table where key = 'a';

source /home/xyz/temp_hql/out/000000_0;
OK
a11     a12     a13
a21     a22     a23

insert overwrite local directory '/home/xyz/temp_hql/out' 
select concat_ws(" ", "select",concat_ws("," , collect_list(value)), "from a_table") 
from b_table where key = 'b';

source /home/xyz/temp_hql/out/000000_0;
OK
b11     b12     b13
b21     b22     b23
Ramdev Sharma
  • 974
  • 1
  • 12
  • 17