1

My question is somewhat similar to the below post. I want to download some data from a hive table using select query. But because the data is large, I want to write it as an external table in a given path. so that I can create a csv file. Uses the below code

create external table output(col1 STRING, col2STRING)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION '{outdir}/output' 

INSERT OVERWRITE TABLE output 
            Select col1, col2 from atable limit 1000

This works fine, and create a file in 0000_ format, which can be copied as a csv file.

But my question is how to ensure that the output will always have a single file? If there is no partition defined, will it always be single file? What is the rule it uses to split files?

Saw few similar questions like below. But it discuss hdfs file access. How to point to a single file with external table

I know the below alternative, but I use a hive connection object to execute queries from a remote node.

hive -e ' selectsql; ' | sed 's/[\t]/,/g' > outpathwithfilename

sjd
  • 1,329
  • 4
  • 28
  • 48

1 Answers1

0

You can set the below property before doing the overwrite

set mapreduce.job.reduces=1;

Note: If the hive engine doesn't allow to be modified at runtime, then whitelist the parameter by setting below property in hive-site.xml

hive.security.authorization.sqlstd.confwhitelist.append=|mapreduce.job.|mapreduce.map.|mapreduce.reduce.*

Prabhat Ratnala
  • 650
  • 5
  • 17