Is there an easy way to set up a bucket in s3 to automatically delete files older than x days?
8 Answers
Amazon has meanwhile introduced S3 lifecycles (see the introductory blog post Amazon S3 - Object Expiration), where you can specify a maximum age in days for objects in a bucket - see Object Expiration for details on its usage via the S3 API or the AWS Management Console.

- 5,638
- 37
- 55

- 435
- 5
- 11
-
+1 for providing an update regarding this outdated information, thanks! – Steffen Opel Feb 26 '12 at 16:44
Amazon now has the ability to set bucket policies to automatically expire content:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html
-
1That link no longer works. https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html has the latest information – Nikhil Jindal Apr 16 '21 at 00:47
You can use s3cmd to write a script to run through your bucket and delete files based on a precondition.
You'll need to write some code (bash, python) on top of it.
You can download s3cmd from http://s3tools.org/s3cmd

- 9,397
- 3
- 25
- 28
shell script to delete old buckets using s3cmd utility
source :
http://shout.setfive.com/2011/12/05/deleting-files-older-than-specified-time-with-s3cmd-and-bash/
#!/bin/bash
# Usage: ./deleteOld "bucketname" "30 days"
s3cmd ls s3://$1 | while read -r line; do
createDate=`echo $line|awk {'print $1" "$2'}`
createDate=`date -d"$createDate" +%s`
olderThan=`date -d"-$2" +%s`
if [[ $createDate -lt $olderThan ]]
then
fileName=`echo $line|awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//'`
echo $fileName
if [[ $fileName != "" ]]
then
s3cmd del "$fileName"
fi
fi
done;

- 127
- 1
- 12

- 29
- 2
-
Usage: ./deleteOld "bucketname" "30 days" eg. s3://dir1/dir2/dir3/ bucketname = "dir1/dir2/dir3/" don't ever ignore last "/" – Aug 23 '16 at 04:20
-
what if file name has space and I need to print all columns after that, `Video 1280x720 (2)13201781136780000000.mp4` just gives Video not the rest. – Ramratan Gupta Sep 20 '17 at 09:07
-
I got solution from https://stackoverflow.com/a/9745022/1589444 – Ramratan Gupta Sep 20 '17 at 09:18
WINDOWS / POWERSHELL
If the lifecycle indication does not suit you, then on Windows Server this can be done by writing a simple PowerShell script
#set a bucket name
$bucket = "my-bucket-name"
#set the expiration date of files
$limit_date = (Get-Date).AddDays(-30)
#get all the files
$files = aws s3 ls "$($bucket)"
#extract the file name and date
$parsed = $files | ForEach-Object { @{ date = $_.split(' ')[0] ; fname = $_.split(' ')[-1] } }
#filter files older than $limit_date
$filtred = $parsed | Where-Object { ![string]::IsNullOrEmpty($_.date) -and [datetime]::parseexact($_.date, 'yyyy-MM-dd', $null) -ge $limit_date }
#remove filtered files
$filtred | ForEach-Object { aws s3 rm "s3://$($bucket)/$($_.fname)" }
This script can fit into one command. Just replace my-bucket-name with the name of your bucket.
aws s3 ls my-backet-name | ForEach-Object { @{ date = $_.split(' ')[0] ; fname = $_.split(' ')[-1] } }| Where-Object { ![string]::IsNullOrEmpty($_.date) -and [datetime]::parseexact($_.date, 'yyyy-MM-dd', $null) -ge $limit_date } | ForEach-Object { aws s3 rm s3://my-backet-name/$_.fname }
Note that this script will only delete files from the root directory but not recursively. If you need to remove data from a subdirectory, then specify it before /$_.fname

- 111
- 3
Here is a Python script to delete N days old files
from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--access_key_id', required=True)
parser.add_argument('--secret_access_key', required=True)
parser.add_argument('--delete_after_retention_days', required=False, default=15)
parser.add_argument('--bucket', required=True)
parser.add_argument('--prefix', required=False, default="")
parser.add_argument('--endpoint', required=True)
args = parser.parse_args()
access_key_id = args.access_key_id
secret_access_key = args.secret_access_key
delete_after_retention_days = int(args.delete_after_retention_days)
bucket = args.bucket
prefix = args.prefix
endpoint = args.endpoint
# get current date
today = datetime.now(timezone.utc)
try:
# create a connection to Wasabi
s3_client = client(
's3',
endpoint_url=endpoint,
access_key_id=access_key_id,
secret_access_key=secret_access_key)
except Exception as e:
raise e
try:
# list all the buckets under the account
list_buckets = s3_client.list_buckets()
except ClientError:
# invalid access keys
raise Exception("Invalid Access or Secret key")
# create a paginator for all objects.
object_response_paginator = s3_client.get_paginator('list_object_versions')
if len(prefix) > 0:
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
else:
operation_parameters = {'Bucket': bucket}
# instantiate temp variables.
delete_list = []
count_current = 0
count_non_current = 0
print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(**operation_parameters):
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1
if (today - version['LastModified']).days > delete_after_retention_days:
delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})
# print objects count
print("-" * 20)
print("$ Before deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)
# delete objects 1000 at a time
print("$ Deleting objects from bucket " + bucket)
for i in range(0, len(delete_list), 1000):
response = s3_client.delete_objects(
Bucket=bucket,
Delete={
'Objects': delete_list[i:i + 1000],
'Quiet': True
}
)
print(response)
# reset counts
count_current = 0
count_non_current = 0
# paginate and recount
print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
if 'Versions' in object_response_itr:
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1
# print objects count
print("-" * 20)
print("$ After deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)
print("$ task complete")
And here is how I run it
python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5
If you want to delete files only from a specific folder then use prefix
parameter

- 131
- 1
- 6
-
works perfectly, though i think there should be an option not to input keys &endpoint – Nic Wanavit Sep 28 '22 at 15:12
Edit:
Since Amazon introduced s3 object expiration on Dec 27, 2011, this answer is no longer valid.
No, S3 is just a datastore. You'll need to use some outside client to periodically delete the old files.

- 148
- 2
- 10

- 7,728
- 7
- 40
- 64
-
5This is no longer true: http://docs.amazonwebservices.com/AmazonS3/latest/UG/ObjectExpiration.html – Tabitha Oct 09 '12 at 23:44
-
1Ah, that makes things much easier. Although, for my backups I still prefer to perform the deletion from my script, to make sure older backups are only purged when a new one has been made successfully. – Martijn Heemels Oct 15 '12 at 08:34
I found much faster solution delete batch using AWS cli
#!/usr/bin/env php
<?php
//remove files which were created 24 hrs ago
$fcmd = 'aws s3 ls s3://<bucket>/<prefix>/ | awk \'{$3=""; print $0}\'';//remove file size and handle file with spaces
exec($fcmd, $output, $return_var);
$seconds_24_hour = 24 * 60 * 60;
$file_deleted_count = 0;
if (!empty($output)) {
$deleted_keys = array();
foreach ($output as $file) {
$file_path = substr($file, 21);
$file_time_stamp = substr($file, 0, 19); //2017-09-19 07:59:41
if (time() - strtotime($file_time_stamp) > $seconds_24_hour) {
$deleted_keys[]["Key"] = "<prefix>/" . $file_path;
$file_deleted_count++;
}
}
if (!empty($deleted_keys)) {
$json_data_delete = array("Objects" => $deleted_keys);
echo $cmd = ("aws s3api delete-objects --bucket <bucket> --delete '" . json_encode($json_data_delete) . "'");
system($cmd);
}
echo "\n$file_deleted_count files deleted from content_media\n";
}
Reference for batch delete https://stackoverflow.com/a/41734090/1589444
Reference for handling file with space with 100% pass case https://stackoverflow.com/questions/36813327/how-to-display-only-files-from-aws-s3-ls-command

- 127
- 1
- 12