I've Rails apps, that record an IP-address from every request to specific URL, but in my IP database i've found facebook blok IP like 66.220.15.* and Google IP (i suggest it come from bot). Is there any formula to determine an IP from request was made by a robot or search engine spider ? Thanks
Asked
Active
Viewed 9,512 times
4 Answers
31
Since the well behaved bots at least typically include a reference URI in the UA string they send, something like:
request.env["HTTP_USER_AGENT"].match(/\(.*https?:\/\/.*\)/)
is an easy way to see if the request is from a bot vs. a human user's agent. This seems to be more robust than trying to match against a comprehensive list.

tribalvibes
- 2,097
- 3
- 25
- 30
-
2+1 For this clever solution, however, bear in mind that Twitter don't follow this rule, use `request.env["HTTP_USER_AGENT"].match(/Twitterbot\/1.0/)`instead. – CV-Gate May 21 '14 at 09:59
-
While this is maybe a clever solution to catch most search engine bots, it is not maintainable and will most likely miss many bots – Cyril Duchon-Doris Mar 06 '17 at 10:30
13
Robots are required (by common sense / courtesy more than any kind of law) to send along a User-Agent with their request. You can check for this using request.env["HTTP_USER_AGENT"]
and filter as you please.

Ryan Bigg
- 106,965
- 23
- 235
- 261
-
Thanks Ryan, yes i make an array of robot user-agent like : AM_I_ROBOT = ["googlebot","twitterbot", "facebookexternalhit", "http://www.google.com/bot.html", "http://www.facebook.com/externalhit_uatext.php", "tweetmemebot", "sitebot", "msnbot", "robot", "bot"] – Agung Prasetyo May 05 '11 at 06:52
-
1Here's a list of user agents: http://www.user-agents.org/ with an XML feed: http://www.user-agents.org/allagents.xml – Sjors Provoost Aug 02 '11 at 23:26
-
1This gist fetches the name of all search engine bots and spammers from user-agents.org and throws them in an array: https://gist.github.com/1121578 It's a pretty long list. – Sjors Provoost Aug 03 '11 at 00:13
-
@sjors that's great except that database is notably missing, e.g. facebook's fetcher. – tribalvibes Feb 14 '12 at 22:22
-
Stumbled across this gem https://github.com/biola/Voight-Kampff. It looks like it handles checking the User Agent so you don't have take an exhaustive approach to listing the User Agents that correspond to bots. Seems like a good, quick solution. – user2977636 Mar 18 '17 at 18:38
3
Another way is to use crawler_detect gem:
CrawlerDetect.is_crawler?("Bot user agent")
=> true
#or after adding Rack::Request extension
request.is_crawler?
=> true
It can be useful if you want to detect a large various of different bots (more than 1000).

Pavel K
- 261
- 2
- 5
-
this one users regexp of concatenated using `|` user agents, but `browser` uses `crawler_list.includes?(current)`. This one may be faster – srghma Sep 27 '19 at 15:36