0

I have two identical tables; one created as the result of using a crawler on a .csv and the other an Iceberg table created with the following command:

CREATE TABLE dan_grafana.iced (
  meter string,
  readtime timestamp,
  kwh_total double)
PARTITIONED BY (`meter`, year(`readtime`))
LOCATION 's3://dev-aws/iceberg/iced'
TBLPROPERTIES (
  'table_type'='iceberg',
  'format'='parquet',
  'optimize_rewrite_delete_file_threshold'='10',
  'write_target_data_file_size_bytes'='134217728'
);

After creating the Iceberg table I copied the data from the .csv file into it; this was the only operation I performed on the Iceberg table. Reading the data from the Iceberg tables takes twice as long as reading the normal .csv file, even though the cost is the same. The number of bytes scanned is 5x more when reading the Iceberg table.

How can I improve the performance of the Iceberg table?

enter image description here

Dan M
  • 4,340
  • 8
  • 20
  • I believe you are operating on very small amount of data which defeats the purpose of using Athena with Iceberg. Can you try larger datasets and compare? – Prabhakar Reddy Oct 05 '22 at 02:49
  • Thanks @PrabhakarReddy I am operating on very small amount of data. because I wanted to see if that made a difference. I was getting similar performance discrepancies in 17.2 GB of data. I will run some more tests. – Dan M Oct 05 '22 at 15:28

0 Answers0