1

This bot doesn't respect nofollow noindex in robots.txt.

I have this in robots.txt:

User-agent: Msnbot
Disallow: /

User-Agent: Msnbot/2.0b
Disallow: /

Till now it was pretty slow, but now, it is a monster that won't leave my site at all. Crawls all WordPress and MyBB 24/7.

To block IP ranges or what can I do to stop all of this content stealers?

halfer
  • 19,824
  • 17
  • 99
  • 186
user3238424
  • 175
  • 3
  • 12
  • possible duplicate of [Blocking Bots by Modifying Htaccess](http://stackoverflow.com/questions/14944780/blocking-bots-by-modifying-htaccess) – halfer Mar 02 '14 at 13:56
  • There's quite a lot of potential duplicates in the _Related_ section in the right-hand pane. – halfer Mar 02 '14 at 13:59
  • @halfer, thanks. But, using that method I will have to do many things. I need some easier way to block all bots except Google Bot. I need to add RewriteCond %{HTTP_USER_AGENT} for every bot that I want to block this way. – user3238424 Mar 02 '14 at 14:14
  • You did specifically ask about one bot. If you block by IP range you'll likely have the same problem, unless there are many bots coming from the same range. – halfer Mar 02 '14 at 14:18
  • Yea, you are right, I asked for MSN bot, cause it ignores Robots.txt. If I can block all by htaccess except Google Bot, than I will just have rulles for Google Bot in robots.txt and it will be nicer and easier. From D. Kasipovic answer I made this code http://pastebin.com/w8719E4c don't know will it work, never tried this. – user3238424 Mar 02 '14 at 14:24
  • The really have no respect for robots. I blocked bing agent yesterday, and today I have huge traffic from following ips (msn china) http://hostingcompass.com/whois/103.25.156.0 http://hostingcompass.com/whois/111.221.28.0 (with user agent Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36 ... no bing or msn traces) – Pablo Jul 13 '16 at 20:36

3 Answers3

2

Based on Block by useragent or empty referer you could something like this in your .htaccess

Options +FollowSymlinks  
RewriteEngine On  
RewriteBase /  
SetEnvIfNoCase User-Agent "^Msnbot" ban_agent
Deny from env=ban_agent
Community
  • 1
  • 1
dkasipovic
  • 5,930
  • 1
  • 19
  • 25
  • @ D. Kasipovic, thank you for the answer. Based on your answer I created this htaccess: http://pastebin.com/w8719E4c Will this block all crawlers/bots except Google One? – user3238424 Mar 02 '14 at 14:12
  • First of all, .com.ba ? Bosna? :) Another thing is, I think that .htaccess will block ALL access, except google bot one. Including users – dkasipovic Mar 02 '14 at 14:59
  • Yes. .com.ba is domain. Bosnian hosting. You know what to fix in code so it doesn't block the users, if is possible, if not I 'll use your code. – user3238424 Mar 02 '14 at 15:23
  • 1
    You cannot block "all bots" per say, you need to block each bot separately. I ja sam iz Bosne, zato pitam :) – dkasipovic Mar 02 '14 at 22:05
0

Here's what you need to do instead:

Code:

User-agent: *
Disallow:

User-agent: MSNbot
Disallow: /

The above code allows all robots except MSNbot.

You can read more about the robots exclusion protocol here.

for example, for bing.

User-agent: MSNBot
Disallow: /

for google

User-agent: googlebot

Disallow: /

if you want block all bots. use this.

User-agent: *

Disallow: /
sreenivas
  • 395
  • 3
  • 17
0

Though I was unable to identify specific bots that visit my site and spend 0:00 time per page, I was able to identify the countries where these attacks are coming from.

enter image description here

Since the attacks are mostly only coming from China and the US, I'm going to block those countries completely from visiting my website using my htaccess file. I hope it works.

I only recommend this if you know you only want traffic from your country and nowhere else, and you're sure you're not losing traffic that you want to get from countries you want to ban.

Here are the links to the tutorial:

https://www.hostinger.com/tutorials/htaccess/how-to-allow-or-block-visitors-from-specific-countries-using-htaccess

https://www.countryipblocks.net/acl.php

I just implemented this now, I hope it works for me. It seems like a good solution for me because my Canadian traffic is good while the US and China traffic all seem to be attacks only.

Again, I recommend discretion when using a solution like this.

Alex Banman
  • 526
  • 1
  • 6
  • 20