-1

In the last few days I was monitoring my website logs and saw a bot that is scanning me a lot. The interval of scanning is very frequently, once of every 5-10 seconds. I was trying to block the bot by write the next code into robots.txt, but after 1 or 2 hours nothing happened, the bot is continuing to scan.

What can I do to stop the bot? Is it a sign that someone is trying to do something harmful to my website?

User-agent: *
Allow: /

User-agent: SemrushBot
Disallow: /

Log example

185.191.171.11 - - [07/Nov/2022:11:38:59 +0200] "GET /example_store.php?id_class=123&id_product=1234&watch=20670906 HTTP/1.1" 500 5225 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"
Pablo Hoffman
  • 1,540
  • 13
  • 19
kevx
  • 79
  • 9

1 Answers1

0

Bots don't re-fetch robots.txt every time they make a call. They cache the file for some amount of time, typically 24 hours. Their documentation doesn't say how long they cache it, but I would expect that you would have to wait more than a couple hours to see changes in bot behavior.

It is also possible that this is some rogue bot that is impersonating a known crawler. If that is the case, it is unlikely to obey robots.txt at all.

If you want more immediate results you can block the bot by IP address or by user agent on your firewall, your CDN, your load balancer, or your server. For example, if your site is using Apache server, you can add a line to your .htaccess to block the bot. See How to block a specific website browser agent?

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
  • I can't block the bot by IP, because everytime it's a different one: 185.191.171.4 // 185.191.171.42 // 185.191.171.11 // 185.191.171.38 // 185.191.171.33 and so on... All this connections are in just one minute. – kevx Nov 10 '22 at 06:55