Following AWS-recommended best practices, we have organization-wide CloudTrail and VPC flow logging configured to log to a centralized logs archive account. Since CloudTrail and VPC flow are organization-wide in multiple regions, we're getting a high number of new log files saved to S3 daily. Most of these files are quite small (several KB).
The high number of small log files is fine while they're in the STANDARD
storage class, since you just pay for total data size without any minimum file size overhead. However, we've found it challenging to deep archive these files after 6 or 12 months, since any storage class other than STANDARD
(such as GLACIER
) has a minimum billable file size (STANDARD-IA
is 128, GLACIER
doesn't have a minimum size but adds 40KB of metadata per object, etc.).
What are the best practices for archiving a large number of small S3 objects? I could use a Lambda to download multiple files, re-bundle them into a larger file, and re-store it, but that would be pretty expensive in terms of compute time and GET/PUT requests. As far as I can tell, S3 Batch Operations has no support for this. Any suggestions?