AWS DMS Losing records using DatabaseMigrationService.Client.describe_table_statistics with large result set

Question

I'm using describe_table_statistics to retrieve the list of tables in the given DMS task, and conditionally looping over describe_table_statistics with the response['Marker'].

When I use no filters, I get the correct number of records, 13k+. When I use a filter, or combination of filters, whose result set has fewer than MaxRecords, I get the correct number of records.

However, when I pass in a filter that would get a record set larger than MaxRecords, I get far fewer records than I should.

Here's my function to retrieve the set of tables:

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while len(response['TableStatistics']) == max_records:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

For troubleshooting, I loop over tables printing one line per table:

        print(', '.join((
            t['SchemaName']
            ,t['TableName']
            ,t['TableState'])))

When I pass in no filters and grep for that table state of 'Table completed' I get 12k+ records, which is the correct count, via the console

So superficially at least, the response loop works.

When I pass in a schema name and the table state filter conditions, I get the correct count, as confirmed by the console, but this count is less than MaxRecords.

When I just pass in the table state filter for 'Table completed', I only get 949 records, so I'm missing 11k records.

I've tried omitting the Filter parameter from the describe_table_statistics inside the loop, but I get the same results in all cases.

I suspect there's something wrong with my call to describe_table_statistics inside the loop but I've been unable to find examples of this in amazon's documentation to confirm that.

cbeckley · Accepted Answer · 2019-03-30T21:22:08.270

When filters are applied, describe_table_statistics does not adhere to the MaxRecords limit.

In fact, what it seems to do is retrieve (2 x MaxRecords), apply the filter, and return that set. Or possibly it retrieves MaxRecords, applies the filter, and continues until the result set is larger than MaxRecords. Either way my while condition was the problem.

I replaced

while len(response['TableStatistics']) == max_records:

with

while 'Marker' in response:

and now the function returns the correct number of records.

Incidentally, my first attempt was

while len(response['TableStatistics']) >= 1:

but on the last iteration of the loop it threw this error:

KeyError: 'Marker'

The completed and working function now looks thusly:

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):
   tables=[]
   max_records=500

   filters=[]
   if schema_name:
      filters.append({'Name':'schema-name', 'Values':[schema_name]})
   if table_state:
      filters.append({'Name':'table-state', 'Values':[table_state]})

   task_arn = get_dms_task_arn(account, region, task_name)

   session = boto3.Session(profile_name=account, region_name=region)
   client = session.client('dms')

   response = client.describe_table_statistics(
      ReplicationTaskArn=task_arn
      ,Filters=filters
      ,MaxRecords=max_records)

   tables += response['TableStatistics']

   while 'Marker' in response:
      response = client.describe_table_statistics(
         ReplicationTaskArn=task_arn
         ,Filters=filters
         ,MaxRecords=max_records
         ,Marker=response['Marker'])

      tables += response['TableStatistics']

   return tables

AWS DMS Losing records using DatabaseMigrationService.Client.describe_table_statistics with large result set

1 Answers1