1

I have Athena and Athena Iceberg tables partitioned by multiple columns. I want to create a logic in Python script to break data if have more than 100 total partitions to insert the data without errors.

SHOW PARTITIONS table_name lists all the partitions but I need the count for them. I need it to be generic to get the count for any table.

I tried select count(*) from information_schema.__internal_partitions__ WHERE table_schema = 'db_name' AND table_name = 'table_name' but it doesn't work.

Any workarounds?

lightyagami96
  • 336
  • 1
  • 4
  • 14

1 Answers1

1

This works for Athena Engine V2:

SELECT count(*) from "table_name$partitions"  
Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
  • I'm getting this error - ```SYNTAX_ERROR: line 1:22: Table awsdatacatalog.db_name.table_name$partitions does not exist``` – lightyagami96 Jul 01 '22 at 18:12
  • Why is there db_name in the query? It should only be "YOUR_TABLE_NAME$partition" – Robert Kossendey Jul 01 '22 at 18:14
  • no, I've not put db_name in my query but the error does show that. – lightyagami96 Jul 01 '22 at 18:17
  • The error says that you are trying to query this table: `awsdatacatalog.db_name.table_name`. Try to remove "awsdatacatalog" – Robert Kossendey Jul 01 '22 at 18:19
  • I think there's some misunderstanding. my query is exactly this ```SELECT count(*) from "table_name$partitions" ``` but I'm still getting the said error with awscatalog.db_name.table_name$partitions doesn't exist. – lightyagami96 Jul 01 '22 at 18:23
  • Okay this is weird. Could you check your region? And can you check whether the same query works without the "$partitions"? It is working for me. – Robert Kossendey Jul 01 '22 at 18:24
  • so the region is ```us-east-1```. Not sure what you mean by without $partitions. Won't it be just a normal count(*)? – lightyagami96 Jul 01 '22 at 19:15
  • @lightyagami96 when you say you're running the query exactly as `SELECT count(*) from "table_name$partitions"`, do you mean you're not replacing "table_name" with your actual table's name? I just tried it on a partitioned Iceberg table I have in Athena and it worked for me as well. – samredai Jul 01 '22 at 19:18
  • Yes it would be anormal count. Just add to your normal count a $partitions after the table name and it should work. – Robert Kossendey Jul 01 '22 at 19:32
  • This works ```select count(*) from "iceberg_tag_poi" ``` but this doesn't ```SELECT count(*) from "iceberg_tag_poi$partitions"``` @RobertKossendey, @samredai – lightyagami96 Jul 01 '22 at 19:55
  • Which Athena Engine Version are you using? – Robert Kossendey Jul 01 '22 at 20:13
  • it's Version 2. us-east-1 – lightyagami96 Jul 01 '22 at 20:21
  • @lightyagami96 I just tried this on an unpartitioned table and I get the same error as you. Are you sure your Iceberg table is partitioned? You can check by running `DESCRIBE iceberg_tag_poi`. – samredai Jul 01 '22 at 21:05
  • Thanks for notifying that,but it is indeed partitioned. This is the result of describe query - ```# Partition spec: # field_name field_transform column_name build_time identity build_time``` – lightyagami96 Jul 01 '22 at 22:02
  • 1
    @lightyagami96 I was actually trying this on an external table. I was able to reproduce this error on a partitioned Iceberg table created using `'table_type' ='ICEBERG'`. It looks like this might not be supported yet but I'm sure it's on the Athena team's roadmap. – samredai Jul 02 '22 at 00:19
  • Thanks @samredai, I also raised a ticket to AWS Support team and they confirmed your answer. Because of ICEBERG's hidden partition, these commands don't work and currently they don't have any plans to support it. However, they said I can raise this as a feedback to their development team. Thanks for the help. – lightyagami96 Jul 13 '22 at 17:00