0

I’m trying to load data from a table with 1 column partitioned to a new table that has 2 partitioned columns, with the newer partitioned column being a regular column from the first table.

For example the create table statements (simplified and changed for understanding):

CREATE EXTERNAL TABLE `house`(                                     
   `color` string,                                     
   `size` string,                                    
   `owner` string)                                     
PARTITIONED BY (                                   
   `dt` date)

CREATE EXTERNAL TABLE `house_new`(                                   
   `size` string,                                    
   `owner` string)                                     
PARTITIONED BY (                                   
   `dt` date, `color` string)

I have a simple insert query like this:

INSERT OVERWRITE TABLE `house_new` PARTITION (`dt`='2018-06-10’, `color`)
SELECT
   `size`,                                      
   `owner`,                                     
   `color`
FROM house
WHERE `dt`='2018-06-10';

So this runs all the way and says the job was successfully completed in both terminal and the Tez UI. Even during the whole process I can see files actually being populated in the tmp/staging folders on S3 with all the correct partitions. However when it finally finishes no files are actually written and all the temporary files are deleted.

Hard coding the partitions (static partitioning) works, and also dynamic partitioning from a 1 partitioned column table to another 1 partitioned column table works. Going from a 1 partitioned column table and trying to derive the 2nd partition from the 1st table does not work.

dl8
  • 1,270
  • 1
  • 14
  • 34
  • Why do you define EXTERNAL TABLEs? Have you set `hive.exec.dynamic.partition=true` ? Also you don't need to specify the dt value in PARTITION; just put color and date as the last fields and remiove bacquotes from the field name everywhere – Harold Jun 15 '18 at 07:15
  • Because we use external tables. Yes that flag is set to true. I'm setting the dt value as a test to have it constrained to 1 day. For production we want to do it in batches otherwise the data would be too huge. The back quotes are needed for the prod values since there are reserved words so I just applied them to all columns for consistency. – dl8 Jun 15 '18 at 14:45
  • Does this `select count (*) cnt from house where `dt`='2018-06-10';` returns value>0 ? – leftjoin Jun 17 '18 at 17:05
  • Yes, doing regular selects work. There's data there, and even temp data in the correct partitions get written during the whole process. Just when the job finally finishes none of the final data gets written. – dl8 Jun 18 '18 at 16:01

0 Answers0