Search bot detection

Question

Is it possible to prevent a site from being scraped by any scrapers, but in the same time allow Search engines to parse your content.

Just checking for User Agent is not the best option, because it's very easy to simulate them.

JavaScript checks could be(Google execute JS) an option, but a good parser can do it too.

Any ideas?

score 1 · Answer 1 · answered May 28 '12 at 14:44

1

Checking link access times might be possible, in other words, if the front page is hit, then the links on the front page are all hit "quickly".

Even easier, drop some hidden links in the page; bots will follow, people almost never will.

answered May 28 '12 at 14:44

Dave Newton

But, won't it block search engine bots too ? – user584397 May 28 '12 at 14:53
@user584397 Most legitimate search bots identify themselves as such, no? I mean, you're trying to do something that's essentially impossible, because any bot could completely execute a page, understand what's hidden, put in random delays to simulate human browsing, etc. Your expectations have to be reasonable, and whatever you do should have a reasonable ROI. – Dave Newton May 28 '12 at 14:55

score 1 · Accepted Answer · answered May 28 '12 at 15:01

Use DNS checking Luke! :)

Same idea provided in help article Verifying Googlebot by Google

2 Answers2