2

I have existing table structure in HIVE which has various fields e.g.(a string, b string, tstamp string, c string) including one tstamp field.

I need to create a new partitioned table(table_partitioned) from the existing table(original_table) but now this new table structure should be partitioned based on the date part of 'tstamp' field.

In order to solve this I created a new table schema similar to the existing table structure, and also added a partition column namely 'date_string' Using the following query :-

CREATE TABLE table_partitioned (a string, b string, tstamp string, c string) PARITITIONED BY (date_string string)

Then I tried to insert data into 'table_partitioned' using following Insert query :-

INSERT OVERWRITE TABLE table_partitioned PARTITION  (date_string) SELECT a,b,tstamp,c,to_date(tstamp) FROM original_table;

But the above Insert statment runs into all sorts of error as follows :-

Diagnostic Messages for this Task:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 35   Cumulative CPU: 985.41 sec   HDFS Read: 5011021574 HDFS Write: 2831773796 FAIL
Total MapReduce CPU Time Spent: 16 minutes 25 seconds 410 msec

If I load only some data by putting a where clause in the Insert statement like below then it works it creates partition for matching rows:-

INSERT OVERWRITE TABLE table_partitioned PARTITION  (date_string) SELECT a,b,tstamp,c,to_date(tstamp) FROM original_table WHERE tstamp='2013-07-23 00:02:00'

NOTE:- I have around 2,00,000 data entry for each day e.g.2013-07-23 and the original_table has data for sound 2 years.

How can I insert all the data and at once, am I missing something ?

hitrix
  • 133
  • 3
  • 11
  • 1
    The error you are showing is not really useful in finding what's wrong, can you pull the real stack trace from the jobtracker and paste it in your question? – Charles Menguy Jul 23 '13 at 20:45
  • There are lot of failed task and each failed task having many attempts, so I was not sure which log amongst them to be put here? Is there a particular log which could give me specifically what could be the error ? – hitrix Jul 23 '13 at 21:14
  • what is your setting of `hive.exec.max.dynamic.partitions`? should be more than the expected amount ~730 for 2 full years. also check `hive.exec.max.dynamic.partitions.pernode` – dimamah Jul 24 '13 at 18:16
  • @dimamah values for hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode are default values 1000 and 100 respectively – hitrix Aug 02 '13 at 17:59
  • is it possible you are generating more than 1000 partitions? this is your current limit for the dynamic load (`hive.exec.max.dynamic.partitions`). If not, please paste the output of the hive.log for this execution (should be in /tmp/$USER/hive.log) or check the location in $HIVE_HOME/conf/hive-log4j.properties – dimamah Aug 07 '13 at 16:58

0 Answers0