I have existing table structure in HIVE which has various fields e.g.(a string, b string, tstamp string, c string) including one tstamp field.
I need to create a new partitioned table(table_partitioned) from the existing table(original_table) but now this new table structure should be partitioned based on the date part of 'tstamp' field.
In order to solve this I created a new table schema similar to the existing table structure, and also added a partition column namely 'date_string' Using the following query :-
CREATE TABLE table_partitioned (a string, b string, tstamp string, c string) PARITITIONED BY (date_string string)
Then I tried to insert data into 'table_partitioned' using following Insert query :-
INSERT OVERWRITE TABLE table_partitioned PARTITION (date_string) SELECT a,b,tstamp,c,to_date(tstamp) FROM original_table;
But the above Insert statment runs into all sorts of error as follows :-
Diagnostic Messages for this Task:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 35 Cumulative CPU: 985.41 sec HDFS Read: 5011021574 HDFS Write: 2831773796 FAIL
Total MapReduce CPU Time Spent: 16 minutes 25 seconds 410 msec
If I load only some data by putting a where clause in the Insert statement like below then it works it creates partition for matching rows:-
INSERT OVERWRITE TABLE table_partitioned PARTITION (date_string) SELECT a,b,tstamp,c,to_date(tstamp) FROM original_table WHERE tstamp='2013-07-23 00:02:00'
NOTE:- I have around 2,00,000 data entry for each day e.g.2013-07-23 and the original_table has data for sound 2 years.
How can I insert all the data and at once, am I missing something ?