35

I don't want the search engines to index my imprint page. How could I do that?

Braiam
  • 1
  • 11
  • 47
  • 78
Proud Member
  • 40,078
  • 47
  • 146
  • 231

7 Answers7

50

Also you can add following meta tag in HEAD of that page

<meta name="robots" content="noindex,nofollow" />
seriyPS
  • 6,817
  • 2
  • 25
  • 16
  • 8
    This is a better solution than using robots.txt. The reason being, if you robots.txt a page out, search engines won't even visit the page. If there are links pointing to the page, they won't remove it from the index because you haven't told them to. Google will show the page without a description, because they know about the page but don't know what's on the page. The only way to explicitly remove it from the index is to tell the engines that you don't want it displayed at all with the 'noindex' command. – eywu Nov 02 '10 at 22:52
  • 2
    This is a bit of a problem (too much more time for coding) if the head is dynamically included as server-side language like php, which will be same for all pages. – Syed Waqas Bukhary Jun 12 '15 at 21:48
33

You need a simple robots.txt file. Basically, it's a text file that tells search engines not to index particular pages.
You don't need to include it in the header of your page; as long as it's in the root directory of your website it will be picked up by crawlers.
Create it in the root folder of your website and put the following text in:

User-Agent: *
Disallow: /imprint-page.htm

Note that you'd replace imprint-page.html in the example with the actual name of the page (or the directory) that you wish to keep from being indexed.

That's it! If you want to get more advanced, you can check out here, here, or here for a lot more info. Also, you can find free tools online that will generate a robots.txt file for you (for example, here).

Donut
  • 110,061
  • 20
  • 134
  • 146
  • Thanks Sam! Added your link next to the other tutorial. – Donut Oct 29 '10 at 19:50
  • Thanks a lot! Must I include robots.txt somewhere in the header? Or is it enough to just drop it into the root of the website? – Proud Member Oct 29 '10 at 19:53
  • Nope, you don't need to include it in a header; it's enough to just put it in your root directory. – Donut Oct 29 '10 at 20:01
  • 1
    According to this blog article: https://www.beussery.com/blog/index.php/2014/06/robots-txt-disallow-20 the information in this post is not correct. The robots.txt file will prevent search engines from crawling the page, but they will still index it. The best solution is to use meta robots tag. See answers below. – jligda Jan 15 '16 at 13:55
  • DV you said "You need a robots.txt" but other answers have indicated clearly that a robots.txt isn't a necessity – barlop Jun 16 '18 at 08:12
5

You can setup a robots.txt file to try and tell search engines to ignore certain directories.

See here for more info.

Basically:

User-agent: *
Disallow: /[directory or file here]
Bryan Denny
  • 27,363
  • 32
  • 109
  • 125
3

Nowadays, the best method is to use a robots meta tag and set it to noindex,follow:

<meta name="robots" content="noindex, follow">
p.campbell
  • 98,673
  • 67
  • 256
  • 322
Jérôme Verstrynge
  • 57,710
  • 92
  • 283
  • 453
3
<meta name="robots" content="noindex, nofollow">

Just include this line in your <html> <head> tag. Why I'm telling you this because if you use robots.txt file to hide your URLs that might be login pages or other protected URLs that you won't show to someone else or search engines.

What I can do is just accessing the robots.txt file directly from your website and can see which URLs you have are secret. Then what is the logic behind this robots.txt file?

The good way is to include the meta tag from above and keep yourself safe from anyone.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Tahir Afridi
  • 190
  • 3
  • 14
0

A robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds: you can explicitly disallow :

User-agent: *
Disallow: /~joe/junk.html

please visit below link for details robots.txt

VISHNU
  • 948
  • 8
  • 15
0

Create a robots.txt file and set the controls there.

Here are the docs for google: http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

Sologoub
  • 5,312
  • 6
  • 37
  • 65