47

I run hive query by java code. Example:

"SELECT * FROM table WHERE id > 100"

How to export result to hdfs file.

Srinivas
  • 1,780
  • 1
  • 14
  • 27
cldo
  • 1,735
  • 6
  • 21
  • 26

11 Answers11

70

The following query will insert the results directly into HDFS:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;
Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
  • You could use the `printf` UDF (0.9) to format your input as you want, otherwise creating your own UDF might be an option. – Charles Menguy Jan 12 '13 at 03:43
  • @Charles - can you show or link to an example of how to use a printf UDF to format output? Not obvious what to do here. – quux00 May 22 '13 at 03:27
  • Thinks Charles is suggesting you use the printf UDF to format what you output, for example: insert overwrite directory '/path/to/output/dir' select printf("%s\t%s",arg1, arg2) from table where id > 100; – Dan Bennett Jan 06 '15 at 18:09
40

This command will redirect the output to a text file of your choice:

$hive -e "select * from table where id > 10" > ~/sample_output.txt
tnguyen80
  • 417
  • 4
  • 3
28

This will put the results in tab delimited file(s) under a directory:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;
topkara
  • 886
  • 9
  • 15
3

I agree with tnguyen80's response. Please note that when there is a specific string value in query better to given entire query in double quotes.

For example:

$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • is this going to write to HDFS or to the local filesystem - that looks like local to me, whereas the "INSERT OVERWRITE DIRECTORY" method writes to HDFS – TobyEvans Nov 22 '16 at 17:59
3

The ideal way to do it will be using "INSERT OVERWRITE DIRECTORY '/pathtofile' select * from temp where id > 100" instead of "hive -e 'select * from...' > /filepath.txt"

3

@sarath how to overwrite the file if i want to run another select * command from a different table and write to same file ?

INSERT OVERWRITE LOCAL DIRECTORY '/home/training/mydata/outputs' SELECT expl , count(expl) as total
FROM ( SELECT explode(splits) as expl FROM ( SELECT split(words,' ') as splits FROM wordcount ) t2 ) t3 GROUP BY expl ;

This is an example to sarath's question

the above is a word count job stored in outputs file which is in local directory :)

Ankita
  • 480
  • 1
  • 6
  • 18
3

Two ways can store HQL query results:

  1. Save into HDFS Location
INSERT OVERWRITE DIRECTORY "HDFS Path" ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
SELECT * FROM XXXX LIMIT 10;
  1. Save to Local File
$hive  -e "select * from table_Name" > ~/sample_output.txt
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
MartenCatcher
  • 2,713
  • 8
  • 26
  • 39
seshu
  • 31
  • 1
2
  1. Create an external table
  2. Insert data into the table
  3. Optional drop the table later, which wont delete that file since it is an external table

Example:

Creating external table to store the query results at '/user/myName/projectA_additionaData/'

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

Feeding the query results into the temp table

 insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

Dropping the temp table

drop table additionaData
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user2458922
  • 1,691
  • 1
  • 17
  • 37
1

To directly save the file in HDFS, use the below command:

hive> insert overwrite  directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;

This will put the contents in the folder /user/cloudera/Sample in HDFS.

0

Enter this line into Hive command line interface:

insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery;

testViewQuery - some specific view

WattsInABox
  • 4,548
  • 2
  • 33
  • 43
Don
  • 21
  • 1
  • Welcome to StackOverflow: if you post code, XML or data samples, please highlight those lines in the text editor and click on the "code samples" button ( { } ) on the editor toolbar or using Ctrl+K on your keyboard to nicely format and syntax highlight it! – WhatsThePoint Nov 22 '17 at 14:37
0

To set output directory and output file format and more, try the following:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...

Example:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;
kennyut
  • 3,671
  • 28
  • 30