I have a hosted wordpress website (not on AWS) and instead of storing large files (ie: audio and video) on the webserver, I use html href links to redirect the request to where they are stored in a S3 bucket (eg: in the website html code, if someone wants to listen to (or download a recording) the code is something like this: a href="https://cdnxxxxxx.s3.eu-west-1.amazonaws.com/name_of_audio_file.mp3" Listen to NAME of AUDIO FILE
My issue is that the bucket policy originally was public READ only (ie: Principal "*", GetObject), but webscrapers and others are following the links and scraping the data (I currently monitor and analyse S3 access logs). I have amended the policy to this, which enforces TLS 1.2 or higher, and (I hope) denys a specific list of scrapers:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObjectIF-SSL>1.1",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::cdnxxxxxx/*",
"Condition": {
"NumericGreaterThan": {
"s3:TlsVersion": "1.1"
}
}
},
{
"Sid": "UserAgentDenyGetObject",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::cdnxxxxxx/*",
"Condition": {
"StringLike": {
"aws:UserAgent": [
"Baiduspider",
"AhrefsBot",
"Semrush",
"yandex",
"2ip",
"ALittle",
"ZoominfoBot",
"cpp-httplib",
"Expanse",
"8LEGS",
"coccocbot",
"Pandalytics"
]
}
}
}
]
}
Should I rather amend the policy to deny access to all except the CloudFlare IP ranges (v4 and v6) as well as my server IP? or is there a more secure method that I havent thought of?
Many thanks