why The Orc file of iceberg larger than Orc file of hive

Asked Jun 08 '23 at 11:07

Active Jun 10 '23 at 18:05

Viewed 45 times

when I use spark 3.3.2，the code is to write to the same table into hive partitioned table(table A) and iceberg(metadata store in hive) partitioned table(table B), both table are orc format and have same compression strategy.

I am doing following test:

I first created an iceberg table(table C) like hive table and add table A into table C (via spark add_files procedure). Then I did a comparison between table C and table B via select ".data_files" metadata table(column_sizes field, I extract the field size via field-id), the result show as below:

data

Question is, why the iceberg table string fields bytes size bigger than spark sql fields bytes size ? especially, map and string type.

Tried set write.orc.compression-strategy to speed or compression, but this did not work

edited Jun 10 '23 at 18:05

Benjamin Buch

4,752
7
28
51

asked Jun 08 '23 at 11:07

geng qing

why The Orc file of iceberg larger than Orc file of hive

0 Answers0