0

I am trying to export a table with partitions. By default, it generates number of files based on the number of partitions. Is there a property I can set to merge the files, what is the performance consideration for making this change.

Few of the properties I found around merging small files, but all them seem to work inside a partition.

set hive.merge.tezfiles=true;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.size.per.task=128000000;
set hive.merge.smallfiles.avgsize=128000000;

I also don't have the option to write a separate concat code to append the files at the end.

wololo
  • 345
  • 2
  • 12

1 Answers1

0

if I understood your question correctly, you could do a select * from table_name and export the result into a file. This will have all the data along with the partition name in a separate columns.

beeline -u jdbc:hive2://quickstart:10000/default --quiet --outputformat=dsv --delimiterForDSV='|' --showHeader=false -e "select * from table_name" > output_file.csv

More on beeline output in the official doc.

I don't think merging all the files from partition is a good approach as it may lead to data corruption.

Rishu Shrivastava
  • 3,745
  • 1
  • 20
  • 41