0

is there a reliable way to prevent google from crawling/indexing/caching a page?

i am thinking about creating a product where users could temporarily share information, using temporary url's.

the information is not very confidential, but i'd definitely like to not see it show up on some cache or even search results.

what's the most reliable way of doing this, and what are the possible pitfalls?

Sonic Soul
  • 23,855
  • 37
  • 130
  • 196

1 Answers1

2

Make a robots.txt file. See http://www.robotstxt.org/ for information.

You can also use <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> in the <HEAD> of your index.html.

Google's specifications for their robots.txt handling are here.

Gordon
  • 1,844
  • 4
  • 17
  • 32
  • thanks. i am just wondering how realistic it is to expect to have this file, and have it work in the way i described in my question. do you have any experience with using it? – Sonic Soul Dec 26 '12 at 18:20
  • It is very standard, and the only agent I've seen ignore it is Baidu (which I have since forcibly blocked via iptables). – Gordon Dec 31 '12 at 19:50