0

when I use spark 3.3.2,the code is to write to the same table into hive partitioned table(table A) and iceberg(metadata store in hive) partitioned table(table B), both table are orc format and have same compression strategy.

I am doing following test:

I first created an iceberg table(table C) like hive table and add table A into table C (via spark add_files procedure). Then I did a comparison between table C and table B via select ".data_files" metadata table(column_sizes field, I extract the field size via field-id), the result show as below:

data

Question is, why the iceberg table string fields bytes size bigger than spark sql fields bytes size ? especially, map and string type.

Tried set write.orc.compression-strategy to speed or compression, but this did not work

Benjamin Buch
  • 4,752
  • 7
  • 28
  • 51
geng qing
  • 1
  • 1

0 Answers0