I´m trying to read just the last partition written in a table in S3 from a Glue Job reading the Dynamic Frame using the push down predicate.
The table I want to read from gets loaded every day, and therefore a new partition gets created for that daily data.
I have another Glue Job that will read from that table but I want to read just the last data written in that last partition. I don´t want to read the whole table and then get the latest data (big data volumn, inefficiency, cost...), since I could use the push down predicate. The problem is, the value of the last partitions change daily.
I have tried using boto3 to list objects from S3, and the get_partitions function to retrieve the values, I know I can query in Athena:
SELECT partition_key, max(partition_value)
FROM information_schema.__internal_partitions__
WHERE table_schema = <database name>
AND table_name = <table name>
group by 1
But is there an easier way to achieve this in a Glue Job?
Thanks