I have two identical tables; one created as the result of using a crawler on a .csv and the other an Iceberg table created with the following command:
CREATE TABLE dan_grafana.iced (
meter string,
readtime timestamp,
kwh_total double)
PARTITIONED BY (`meter`, year(`readtime`))
LOCATION 's3://dev-aws/iceberg/iced'
TBLPROPERTIES (
'table_type'='iceberg',
'format'='parquet',
'optimize_rewrite_delete_file_threshold'='10',
'write_target_data_file_size_bytes'='134217728'
);
After creating the Iceberg table I copied the data from the .csv file into it; this was the only operation I performed on the Iceberg table. Reading the data from the Iceberg tables takes twice as long as reading the normal .csv file, even though the cost is the same. The number of bytes scanned is 5x more when reading the Iceberg table.
How can I improve the performance of the Iceberg table?