when I use spark 3.3.2
,the code is to write to the same table into hive partitioned table(table A)
and iceberg(metadata store in hive) partitioned table(table B)
, both table are orc
format and have same compression strategy.
I am doing following test:
I first created an iceberg table(table C)
like hive table and add table A
into table C
(via spark add_files procedure). Then I did a comparison between table C
and table B
via select ".data_files"
metadata table(column_sizes field, I extract the field size via field-id), the result show as below:
Question is, why the iceberg table string fields bytes size bigger than spark sql fields bytes size ? especially, map and string type.
Tried set write.orc.compression-strategy
to speed or compression, but this did not work