2

Assuming I have a partitioned table in my HDFS, that gets new information all the time. New data will be partitioned by days by default, while all of the other files are partitioned by months. How can I merge partitions so by this example I would be able to merge all days partitions that came in the last month to be a month partition? Is there a way to repartition only some of the table’s partitions? I’d like to repartition only some of my partitions so only partitions that are small enough would be merged.

Also, does it even possible to merge partitions or should I try to read them, delete and write again to one partition? I'm thinking of something like concatenating the files.

I’d like to know what is the best way to merge only some partitions of a table.

user7551211
  • 649
  • 1
  • 6
  • 25
  • if i understand correctly you have daily incremental partititon which filesize is small, you want to merge them into month and the make size bigger? to use spark, you can read the daily partition in, coalesce them into 1 partition, and write to a new table which is partitioned by month, it will solve the problem – E.ZY. Feb 19 '20 at 21:24

0 Answers0