Boto3 start glue crawler with new s3 input

Question

I have an amazon glue crawler, which looks at a specific s3 location, containing avro files. I have a process which outputs files in a new subfolder of that location.

Once I manually run the crawler, the new subfolder will be seen as a new table in a database, and it will also be is query-able from Athena.

Is there a way I can automate the process, and call the crawler programatically, but only specifying that new subfolder, so that it doesn't have to scan the entire parent folder structure? I want to add tables to a databases, and not partitions to an existing table.

I was looking for a Python option, and I have seen indeed that one can do:

import boto3
glue_client = boto3.client('glue', region_name='us-east-1')
glue_client.start_crawler(Name='avro-crawler')

I haven't seen an option to pass a folder to limit where the crawler is looking into. Because there are hundreds of folders/tables in that location, re-crawling everything takes a long time, which I'm trying to avoid.

What are my options here? Would I need to programatically create a new crawler with each new subfolder added to s3?

Or create a lambda function which gets triggered when a new subfolder gets added to s3? I've seen an answer here , but even with lambda, it still implies I call the start_crawler, which would crawl everything?

Thanks for any suggestions.

score 4 · Accepted Answer · edited Oct 25 '19 at 15:06

4

Update crawler_name to your crawler_name and update_path to your update path.

response = glue_client.update_crawler(Name=crawler_name,
                           Targets={'S3Targets': [{'Path':update_path}]})

edited Oct 25 '19 at 15:06

sam

1,819
1
18
30

answered Aug 21 '18 at 15:21

Kishore

86
1
4

2

Thanks, that's exactly the path I got into. As a reference for others, one can use an existing crawler, you just need to update its S3 target paths. Once that's done, when you run start_crawler, it will use the new updated S3 paths. – cristi.calugaru Aug 21 '18 at 16:47
1

But will it return everything back to normal after it runs? – pavel_orekhov May 30 '19 at 15:05
@hey_you No. It's a permanent change until the crawler is updated again. – Asclepius Apr 14 '21 at 23:04

Boto3 start glue crawler with new s3 input

1 Answers1