0

It is my maiden voyage into Hive. I have multiple Hive tables, like snapshots with names as follows:

revenue_20110131
reveue_20110228
revenue_20110331

purchases_qrt1
purchases_qrt2
purchases_qrt3
purchases_qrt4

I have a lot of such snapshot tables. Now, I need to build a script that takes a part of table name as the parameter and reads the records from all such similarly named tables and exports the entire data from all those tables into a single ORC file.

How to do this in Hive? I have no idea where to start as I've never worked on Hive before. Can someone please help me? Thanks in advance, guys.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
LearneR
  • 2,351
  • 3
  • 26
  • 50

1 Answers1

1

If the tables have common upper sub-directory in their location, you can create new table using upper directory and select all of them in single select.

create table new tbl 
...
location 'upper common directory path here'

then add these settings before select:

set hive.mapred.supports.subdirectories=TRUE;
set mapred.input.dir.recursive=TRUE;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • what if "common upper sub-directory" contains other tables that should not be included into selection, is there any hive directive for handling that? – mangusta Jun 12 '19 at 18:41
  • 1
    Yes. exactly. Then you cannot query all these tables without listing all of them. You can write some script using shell which will add dummy partitions to the table based on ls path/purchases_qrt* – leftjoin Jun 12 '19 at 18:46
  • Is there a way to simply SELECT * the data of all these tables at once without having a to create a new table? Something like a VIEW? – LearneR Jun 13 '19 at 07:32
  • 1
    Using `union all` you can do `select * from tbl1 union all select * from tbl2 union all ...` And you need to select from each table specifying its name. There is no way to select * from all tables by template in a single statement – leftjoin Jun 13 '19 at 07:38