0

I've came out some problem using hive to select data within large range partitions

Here's the HQL I want to execute:

INSERT OVERWRITE TABLE summary_T partition(DateRange='20131222-20131228')
select col1, col2, col3 From RAW_TABLE 
where cdate between '20131222' and '20131228' 
and (trim(col1) IS NULL or trim(col1)='')
and length(col2)=12;

"cdate" is the partition of table RAW_TABLE

but it stucks after giving me the job id

Once I change it into:

INSERT OVERWRITE TABLE summary_T partition(DateRange='20131222-20131228')
select col1, col2, col3 From RAW_TABLE 
where cdate between '20131222' and '20131225' 
and (trim(col1) IS NULL or trim(col1)='')
and length(col2)=12;

then it starts work

Is there any solution that can help me execute the first HQL?

thanks for helping!

Dennis Shen
  • 61
  • 1
  • 6

1 Answers1

0

I encountered a similar problem, and tried using CLUSTER BY 'partition_column' at the end of my SELECT statement. After using it I could execute my INSERT for a substantial greater date range.

So if you change your query to:

INSERT OVERWRITE TABLE summary_T partition(DateRange='20131222-20131228')
select col1, col2, col3 From RAW_TABLE 
where cdate between '20131222' and '20131228' 
and (trim(col1) IS NULL or trim(col1)='')
and length(col2)=12
CLUSTER BY DateRange;

The performance shall improve.

For explanation on how CLUSTER BY helps a query, you can go through this manual page which explains it in detail:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy

Neels
  • 2,547
  • 6
  • 33
  • 40
  • Thanks for sharing your experience!! I'll try out this method then report it back. – Dennis Shen Feb 22 '14 at 09:04
  • I get an error above: FAILED: SemanticException [Error 10004]: Line 6:11 Invalid table alias or column reference 'DateRange': (possible column names are: col1, col2, col3) looks like it can't execute the 'CLUSTER BY' with partition column – Dennis Shen Feb 24 '14 at 00:58
  • Just try to execute the Select statement with CLUSTER BY first.. In my case, I was able to execute the INSERT. It did not throw any such error. – Neels Feb 24 '14 at 05:46