1

Im using the below function to get all partitions from AWS Glue catalog table. There are some tables in the database that has more than 50K partitions. Is it possible to get only the partitions based on the 'LastAccessTime' attribute. I know I can filter from the list of all partitions but I want to see if there is any possible way, using the 'Expression' property of the get_partitions or something so I can avoid scanning the entire catalog table

def get_glue_partitions(glue_client, database_name, table_name):
    """Grabs all partitions for a table, and iterates through pagination."""
    partitions = []

    next_token = ''
    while True:  
        resp = glue_client.get_partitions(
            DatabaseName=database_name,
            TableName=table_name,
            NextToken=next_token)
        partitions += resp['Partitions']

        if 'NextToken' not in resp:
            break
        next_token = resp['NextToken']
    return partitions
Lisa Mathew
  • 305
  • 4
  • 18

1 Answers1

0

As I've come to discover, the way the --expression filter works is by filtering on the partition column(s) defined in your table rather than properties defined in the AWS API. For instance, if your table is partitioned on create_date, the expression in the API call can reference this column:

glue_client.get_partitions(
    DatabaseName=database_name,
    TableName=table_name,
    NextToken=next_token,
    Expression="create_date > '2023-01-01'",
)

To my knowledge, I don't believe this expression filter is able to operate on other properties defined on a Glue Partition object unfortunately.

circld
  • 632
  • 1
  • 7
  • 14