I'd like to monitor/analyze CloudTrail log files which are stored at S3 bucket. So, I read AWS docs about how CloudTrail and Athena works, and for optimization of Athena queries I decided to create some partitions.
Here is example of structure of data at S3:
s3://<s3 bucket name>/AWSLogs/<account id>/CloudTrail/<region>/<year>/<month>/<day>/CloudTrail-log-file.json.gz
.
So, should LOCATION
be equal to s3://<s3 bucket name>/AWSLogs/<account id>/CloudTrail/
?
And partitions are: region
, year
, month
, day
?
And the main question is: whether it is necessary to update the partitions if the schema remains the same, but new data is added (for example new dir
with year/month/day)? Or if schema will be the same I should define partitions only once?
If partitions have to be updated when new dir
like year/month/day is added, which way is the best for this purpose (use custom Lambda trigger on S3 with only Athena API, OR use/configure Glue Crawler)?
Thanks a lot for any information about this case.