Lately i'm facing some issues with a data mining bot, extracting data from my website everyday at certain hours of the day. This will not only waste my bandwidth but also giving wrong data to my google analytics.
They usually use amazonaws IPs to enter however lately they've switched to other host.
What remains static is that they use the same user agent. is there a way to block using useragent? This is because i've tried it but it failed. Hopefully i can get a light out of this.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu HeadlessChrome HeadlessChrome Safari/537.36
RewriteRule .* - [R=503,L]
Update: This is my updated .htaacess just for future reference if it helps the community on how it should look. Thanks MrWhite
<LocationMatch .*>
<IfModule mod_security2.c>
SecRuleRemoveById 211170
SecRuleRemoveById 211180
</IfModule>
</LocationMatch>
Options +FollowSymlinks
Options -Indexes
<FilesMatch "(?i)((\.tpl|\.ini|\.log|(?<!robots)\.txt))">
Require all denied
</FilesMatch>
# SEO URL Settings
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu HeadlessChrome HeadlessChrome Safari/537.36"
RewriteRule .* - [F]
RewriteBase /
RewriteRule ^sitemap.xml$ index.php?route=extension/feed/google_sitemap [L]
RewriteRule ^googlebase.xml$ index.php?route=extension/feed/google_base [L]
RewriteRule ^system/download/(.*) index.php?route=error/not_found [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(ico|gif|jpg|jpeg|png|js|css)
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]
<Files 403.shtml>
order allow,deny
allow from all
</Files>