0

I have stored the data in hdfs using Pig Multistorage with the column id.

So data stored as

/output/1/part-0000
/output/2/
/output/3/

Now I have created a partitioned table in hive and I want to load the data from /output folder into this partitioned table. Is there any way to achieve this?

wazza
  • 770
  • 5
  • 17
  • 42

3 Answers3

0

First you create a temp hive table where you load all the data from pig output.
Then You load to your actual partitioned hive table from temp table.
Something like below:

FROM emp_external temp INSERT OVERWRITE TABLE emp_partition PARTITION(country) SELECT temp.id,temp.name,temp.dept,temp.sal,temp.country;   

Else you can explore Hcatlog for this case.

Amaresh
  • 3,231
  • 7
  • 37
  • 60
0

not sure if you are looking to insert the data in the outputfolder (created from pig) to an existing table or loading the data in the output folder in to a new hive partitioned table.

If you want to load the data in to new hive table, you can create a new partitioned table pointing to the output folder

If you are looking to load the data into an existing hive table, then you can either create a temp table as @Aman mentioed and do a insert in to the destination table

or

You can just move/copy the files in the hdfs from output/ to hive table location.

Hope this helps

Anil
  • 420
  • 2
  • 16
0

Assign a Hive schema to pig output location with partitioned columns (Alter table Add Partition) as column id. Now both are hive tables and you can use where clause over partitioned column to move over the data.