AWS Glue Crawler creates multiple tables when reading empty files

Asked Feb 15 '22 at 14:19

Active Feb 15 '22 at 14:19

Viewed 309 times

I'm writing a Glue Crawler as a part of an ETL, and I have a very annoying problem - The S3 bucket I'm crawling contains many different JSON files, all with the same schema. When crawling the bucket, the crawler creates a new table for every empty file and one additional table for the non-empty files.

When manually deleting the empty files and running the crawler - I get the expected behaviour, one table is created with the non-empty files data.

Is there a way to avoid this? I'm having issues to delete the empty files before crawling.

Many thanks.

asked Feb 15 '22 at 14:19

Golden

Hi, were you able to fix this issue? I am also facing the same issue. – Nitesh May 31 '23 at 04:24

AWS Glue Crawler creates multiple tables when reading empty files

0 Answers0