0

I am trying to executing a hive query, and export its output in HDFS with SEQUENCE FILE format.

beeline> show create table test_table;

+--------------------------------------------------------------------------------------+
|                                    createtab_stmt                                    |
+--------------------------------------------------------------------------------------+
| CREATE TABLE `test_table`(                                                           |
|   `XXXXXXXXXXXXXX` bigint,                                                           |
|   `XXXXXXXXXXXxx` int,                                                               |
|   `XXXXXXXXX` int,                                                                   |
|   `XXXXXX` int)                                                                      |
| PARTITIONED BY (                                                                     |
|   `XXXXXXXX` string,                                                                 |
|   `XXXX` string,                                                                     |
|   `XXXXXXXX` string)                                                                 |
| ROW FORMAT DELIMITED                                                                 |
|   FIELDS TERMINATED BY '\u00001'                                                     |
| STORED AS INPUTFORMAT                                                                |
|   'org.apache.hadoop.mapred.SequenceFileInputFormat'                                 |
| OUTPUTFORMAT                                                                         |
|   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'                        |
| LOCATION                                                                             |
|   'hdfs://localhost:8020/user/hive/warehouse/local_hive_report.db/test_table'        |
| TBLPROPERTIES (                                                                      |
|   'transient_lastDdlTime'='1437569941')                                              |
+--------------------------------------------------------------------------------------+

Here is the query which I tried to export the data,

beeline> INSERT OVERWRITE DIRECTORY '/user/nages/load/date' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    LINES TERMINATED BY '\n' 
    STORED AS SEQUENCEFILE 
    SELECT * FROM test_table WHERE column=value;

Here is the error,

    Error: Error while compiling statement: FAILED: ParseException line 1:61 
cannot recognize input near 'ROW' 'FORMAT' 'DELIMITED' in statement (state=42000,code=40000)

Am I missing something here?

Software version: Cloudera hadoop CDH5.3.3, Apache version 0.13.1.

Edit: Updated my temprorary solution below.

Nageswaran
  • 7,481
  • 14
  • 55
  • 74

3 Answers3

0

This is because the hive query will by default use the ^ as the delimiter. You can try the same by exporting to local file system.That should be supported.

beeline> INSERT OVERWRITE LOCAL DIRECTORY '/user/~local directoryname' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    LINES TERMINATED BY '\n' 
    STORED AS SEQUENCEFILE 
    SELECT * FROM test_table WHERE column=value;
maxymoo
  • 35,286
  • 11
  • 92
  • 119
  • 1) I want the output in HDFS. 2) The file format should be SEQUENCE FILE. Is there any property to override to achieve this? – Nageswaran Jul 27 '15 at 07:09
  • 1
    create an external table to the location where you want your output file.Use create table as command and insert the required data into the external table.By that you will get the data in the HDFS location. – kiran Sreekumar Jul 27 '15 at 13:41
  • Yeah, that is what I did finally. – Nageswaran Jul 28 '15 at 04:52
0

As a temporary fix, I have created a Hive table with sequence file format, and inserted selected records into it.

CREATE TABLE temp_table
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS SEQUENCEFILE
AS 
SELECT * FROM test_table WHERE column=value;

This will create sequence file in the following location in HDFS.

/<HIVE_DATABASE_ROOT>/temp_table.db/

Nageswaran
  • 7,481
  • 14
  • 55
  • 74
0

This script works for me:

CREATE EXTERNAL TABLE dept_seq (department_id int, department_name string) ROW FORMAT DELIMITED FIELDS TERMINATED by '\001' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs:///user/cloudera/departments_seq';
Om Sao
  • 7,064
  • 2
  • 47
  • 61
Sonai
  • 1