12

I want to create a private url as

http://domain.com/content.php?secret_token=XXXXX

Then, only visitors who have the exact URL (e.g. received by email) can see the page. We check the $_GET['secret_token'] before displaying the content.

My problem is that if by any chance search bots find the URL, they will simply index it and the URL will be public. Is there a practical method to avoid bot visits and subsequent index?

Possible But Unfavorable Methods:

  1. Login system (e.g. by php session): But I do not want to offer user login.

  2. Password-protected folder: The problem is as above.

  3. Using Robots.txt: Many search engine bots do not respect it.

Googlebot
  • 15,159
  • 44
  • 133
  • 229

6 Answers6

7

What you are talking about is security through obscurity. Its never a good idea. If you must, I would offer these thoughts:

  • Make the link expire
  • Lock the link to the C or D class of IPs that it was accessed from the first time
  • Have the page challenge the user with something like a logic question before forwarding to the real page with a time sensitive token (2 step process), and if the challenge fails send a 404 back so the crawler stops.
hakre
  • 193,403
  • 52
  • 435
  • 836
CrazyDart
  • 3,803
  • 2
  • 23
  • 29
3

Try generating a 5-6 alphanumeric password and attach along with the email, so eventhough robots spider it , they need password to access the page. (Just an extra added safety measure)

Shankar Narayana Damodaran
  • 68,075
  • 43
  • 96
  • 126
1
  • If there is no link to it (including that the folder has no index view), the robot won't find it
  • You could return a 404, if the token is wrong: This way, a robot (and who else doesn't have the token) will think, there is no such page
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • 1
    "If there is no link to it": I would say if it is ever sent in an email there is a good chance it will be indexed. – CrazyDart Feb 09 '12 at 18:20
  • 1
    Well, if it is sent in an e-mail WITH the token, and this gets indexed, you are out of luck. With any solution. If you consider the contents of all e-mails public, then no authentication short of a hardware dongle will help: The moment you mail somebody the token/password whatever, the moment everybody knows it. – Eugen Rieck Feb 10 '12 at 01:55
1

As long as you don't link to it, no spider will pick it up. And, since you don't want any password protection, the link is going to work for everyone. Consider disabling the secret key after it is used.

Lg102
  • 4,733
  • 3
  • 38
  • 61
  • 2
    "As long as you don't link to it, no spider will pick it up": Thats not true at all. Spiders get seeded from many sources, not just links from other pages. – CrazyDart Feb 09 '12 at 18:20
1

you only need to tell the search engines not to index /content.php, and search engines that honor robots.txt wont index any pages that start with /content.php.

zzzzBov
  • 174,988
  • 54
  • 320
  • 367
  • 1
    one bad bot is enough to make my private content publicly available! – Googlebot Feb 09 '12 at 18:41
  • if you're emailing a secret token to users, your private content is already publicly available. The only way a bot would find the secret token is if someone linked to it somewhere that the bot had access to. If that happens, your secret is not a secret anyway. – zzzzBov Feb 09 '12 at 19:05
  • As others pointed, it can be in different ways; e.g. visitors toolbar. – Googlebot Feb 09 '12 at 21:22
1

Leaving the link unpublished will be ok in most circumstances...

...However, I will warn you that the prevalence of browser toolbars (Google and Yahoo come to mind) change the game. One company I worked for had pages from their intranet indexed in Google. You could search for the page, and a few results came up, but you couldn't access them unless you were inside our firewall or VPN'd in.

We figured the only way those links got propagated to Google had to be through the toolbar. (If anyone else has a better explanation, I'd love to hear it...) I've been out of that company a while now, so I don't know if they ever figured out definitively what happened there.

I know, strange but true...

Tim
  • 4,051
  • 10
  • 36
  • 60
  • You are quite right! This is the reason that I am afraid of sharing private URL as described above. – Googlebot Feb 09 '12 at 18:41
  • I think you'll need to send a code in the email (as mentioned in an earlier post) or something like a CAPTCHA to keep the bots away and be sure you're dealing with a person. – Tim Feb 09 '12 at 18:53