-2

As I know, if we want to prevent robots accessing our web sites we have to parse 'User-Agent' header in http request then check whether the request coming from robots or browsers.

I think we can not completely prevent robot accessing our web sites because someone can program to use any http client to send Http request with FAKE browser user-agent so for this case, we can not know fake user-agent is real user-agent coming from a browser or coming from a robot program (by programmed).

My question is there is any way to prevent completely robot accessing our web sites?

LHA
  • 9,398
  • 8
  • 46
  • 85

3 Answers3

1

You cannot eliminate the bots, but you can greatly reduce them.

Obvious option you're already using is user-agent detection

You could also load your page content through ajax using JavaScript which would eliminate any bot that cannot process javascript. So just have an empty div with the id="content" and on page ready do an ajax call to insert the content. This means if anyone uses curl or similar to scrape your page content it wouldn't work. IF the bot is built for your site specifically it's easy to work around but most random bots wouldn't get through it probably.

You could also obfuscate the target url in JS... and/or make it automatic by using location.href to tell ajax to look for a content file by the same name in a different folder.

You could of course to a captcha before a user (or bot) could enter the site, but that's annoying to users.

IF it's less about accessing the page and has to do with form submission then captcha is a great choice or you could do a honey-pot where you put in a form field that is hidden by css and the robot will fill out that field but the human won't (because it's hidden) and you can detect that.

Lenny
  • 5,663
  • 2
  • 19
  • 27
0

Other than placing your pages behind some kind of authentication method, the answer is no.

Obviously, the authentication would also apply to humans.

driis
  • 161,458
  • 45
  • 265
  • 341
0

I think that autentication with captcha is the easier way and the most used. Other options would be to ask simply questions to the user (simply to humans but not to bots). However all these methods are annoying for human users.

HAL9000
  • 3,562
  • 3
  • 25
  • 47