I have a very flat S3 bucket. Here is how the S3 object keys in that bucket look like:
- s3-access-logs/2017-11-03-00-22-36-05A50CD782AE8AE0
- s3-access-logs/2017-11-03-00-24-21-F14ED1FF6C315431
As you can see I have only one S3 folder "s3-access-logs" with ALL the objects under that folder. In fact this S3 bucket contains the S3 access log for a different S3 bucket.
I want to run some analysis on these S3 access log (using Athena). Athena allows me to either:
- Create an Athena Table using the S3 bucket as location, or
- Create an Athena Table with partitioning turned on and I can add a partition using an s3 prefix.
I only care about the access log for a certain date so I want to avoid scanning the entire S3 bucket (which I tried and the query never completed after more than 15 min). I would like Athena to only scan files for that date. I noticied that Athena is OK with using "s3-access-logs" as the S3 location / prefix, but Athena does not seem to support using "s3-access-logs/2017-11-03" as the S3 location / prefix.
Is it true that Athena only support "S3 Folder" as prefix or location (i.e. the prefix string must end with a slash), but not any random string in the s3 object key prefix? If so is there any workaround for this issue?
Thanks!