whether it is necessary to update the partitions if the schema remains the same, but new data is added

Question

I'd like to monitor/analyze CloudTrail log files which are stored at S3 bucket. So, I read AWS docs about how CloudTrail and Athena works, and for optimization of Athena queries I decided to create some partitions.

Here is example of structure of data at S3:

s3://<s3 bucket name>/AWSLogs/<account id>/CloudTrail/<region>/<year>/<month>/<day>/CloudTrail-log-file.json.gz.

So, should LOCATION be equal to s3://<s3 bucket name>/AWSLogs/<account id>/CloudTrail/ ?

And partitions are: region, year, month, day?

And the main question is: whether it is necessary to update the partitions if the schema remains the same, but new data is added (for example new dir with year/month/day)? Or if schema will be the same I should define partitions only once?

If partitions have to be updated when new dir like year/month/day is added, which way is the best for this purpose (use custom Lambda trigger on S3 with only Athena API, OR use/configure Glue Crawler)?

Thanks a lot for any information about this case.

score 1 · Answer 1 · answered Jun 24 '21 at 16:28

1

If you use partition projection, there's no need to update them. Here are the docs: https://docs.aws.amazon.com/athena/latest/ug/partition-projection.html

answered Jun 24 '21 at 16:28

Nicolas Busca

1,100
7
14

whether it is necessary to update the partitions if the schema remains the same, but new data is added

1 Answers1