3

I'm writing a Glue Crawler as a part of an ETL, and I have a very annoying problem - The S3 bucket I'm crawling contains many different JSON files, all with the same schema. When crawling the bucket, the crawler creates a new table for every empty file and one additional table for the non-empty files.

When manually deleting the empty files and running the crawler - I get the expected behaviour, one table is created with the non-empty files data.

Is there a way to avoid this? I'm having issues to delete the empty files before crawling.

Many thanks.

Golden
  • 407
  • 2
  • 12

0 Answers0