My question is somewhat similar to the below post. I want to download some data from a hive table using select
query. But because the data is large, I want to write it as an external table in a given path. so that I can create a csv file. Uses the below code
create external table output(col1 STRING, col2STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '{outdir}/output'
INSERT OVERWRITE TABLE output
Select col1, col2 from atable limit 1000
This works fine, and create a file in 0000_
format, which can be copied as a csv file.
But my question is how to ensure that the output
will always have a single file? If there is no partition defined, will it always be single file? What is the rule it uses to split files?
Saw few similar questions like below. But it discuss hdfs file access. How to point to a single file with external table
I know the below alternative, but I use a hive connection object to execute queries from a remote node.
hive -e ' selectsql; ' | sed 's/[\t]/,/g' > outpathwithfilename