Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

1 answer

How can I exclude certain paths from being crawled/indexed?

I have the following url structure on my website: user accounts: http://www.mydomain.com/username user my contain items that are under: http://www.mydomain.com/username/item/itemId What do I have to set in my robots.txt that the user accounts…

asked Aug 11 '13 at 16:01

Michael

32,527
49
210
370

votes

2 answers

How to setup robots.txt on multi-site VPS

So I have a VPS (running debian) setup to host a number of sites i'm working on. with the structure like…

web vps robots.txt

asked Aug 05 '13 at 15:29

user2653673

votes

1 answer

Will setting noindex/nofollow on parent pages affect site SEO for child pages?

This is a two part question. Each parent page to link to the first Child page. They parent pages will not have any content. They will serve as main menu links, site URL structure and site hierarchy. My website(wp) structure is as…

seo sitemap robots.txt nofollow noindex

asked Aug 05 '13 at 14:22

Nick Rivers

votes

2 answers

How to Add robots.txt to a Vaadin 7 Application with CDI-Integration?

how can I add a robots.txt file to a Vaadin application? I found nearly nothing related, but what I found states that there is no support for such a file. I'm using Vaadin 7.1.1 with JBoss 7.1.1 and Vaadin-CDI-Integration. My workaround approach…

java jboss7.x vaadin cdi robots.txt

asked Aug 02 '13 at 16:35

aboger

2,214
6
33
47

votes

1 answer

How do I only allow crawlers to visit a part of the site?

I've got an ajax rich website which has extensive _escaped_fragment_ portions for Ajax indexing. While all my _escaped_fragment_ urls do 301 redirects to a special module which then outputs the HTML snapshots the crawlers need (i.e.…

php web-crawler robots.txt

asked Jul 30 '13 at 08:41

Swader

11,387
14
50
84

votes

1 answer

Robots.txt http://example.com vs.http:// www.example.com

I have a situation where we have two code bases that need to stay intact.. example: http://example.com. And a new site http://www.example.com. The old site (no WWW) supports some legacy code and has the rule: User-agent: * Disallow: / But in the…

seo robots.txt

asked Nov 25 '09 at 01:46

g00se0ne

4,560
2
21
14

votes

1 answer

Unlist a subdomain or directory according to robotstxt.org

According to robotstxt.org The first answer is a workaround: You could put all the files you don't want robots to visit in a separate sub directory, make that directory un-listable on the web (by configuring your server) How do I configure my…

robots.txt

asked Jul 29 '13 at 02:53

EGHDK

17,818
45
129
204

votes

1 answer

how to deindex specific category in opencart through robots.txt file

Hello if i am not wrong robots.txt file will be this for opencart User-agent: * Disallow: /*&limit Disallow: /*&sort Disallow: /*?route=checkout/ Disallow: /*?route=account/ Disallow: /*?route=product/search Disallow: /*?route=affiliate/ Allow: / I…

php opencart robots.txt

asked Jul 22 '13 at 00:26

Mohammad Farhan

votes

1 answer

Made changes to robots.txt but search engines still say description not available

Most of the questions I see are trying to hide the site from being indexed by search engines. For myself, I'm attempting the opposite. For the robots.txt file, I've put the following: # robots.txt User-agent: * Allow: / # End robots.txt…

search web robots.txt

asked Jul 17 '13 at 16:09

Nina

1,037
10
19

votes

2 answers

Need to block some URL from robots file

I would like to disallow some URLs in robots file of my website and have some difficulties. Right now my robots file has the following content: User-agent: * Allow: / Disallow: /cgi-bin/ Sitemap: http://seriesgate.tv/sitemap.xml I do not want…

php robots.txt

asked Jul 15 '13 at 17:00

alikarimi

votes

1 answer

How to prohibit bot access to physical location of robots.txt for multi-site?

If I have the following in my .htaccess: (disallow bots from going to /dir1/dir2) Disallow: /dir1/dir2 And I have in my .htaccess: (when accessing robots.txt, pipe them the data from dir1/dir2/robots.txt) RewriteCond %{HTTP_HOST}…

.htaccess robots.txt

asked Jul 14 '13 at 20:30

Lakitu

votes

1 answer

Allow crawling of only the home page of a sub-directory using robots.txt

I have www.example.com with WordPress and www.example.com/sitetwo with another WordPress I would allow crawling for the entire example.com and only the home page of example.com/sitetwo. What I have to write in my robots.txt?

html robots.txt

asked Jul 06 '13 at 11:10

michele

26,348
30
111
168

votes

2 answers

Grails Files in Root not found

I have a Grails app and want to make a robots.txt and sitemap.xml file. I read that the best way to put them into the application is in the web-app folder. When I run the site locally and test http://mysite/app/robots.txt everything works, but when…

tomcat grails robots.txt

asked Jul 04 '13 at 13:43

skaz

21,962
20
69
98

votes

1 answer

Robots interpreting script tags

Our web application is currently crawled by a multitude of robots. However, some of them seem to try and parse javascript tags and interpret some of it as links, which are called and fill our error log with loads of 404s. On our pages we have…

web-applications web-crawler robots.txt

asked Jul 02 '13 at 12:27

Thomas

87,414
12
119
157

votes

1 answer

Best way to prevent Google from indexing a directory

I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are: Adding it into the robots.txt file: Disallow: /directory/ Adding a meta tag:

seo search-engine robots.txt

asked Jun 30 '13 at 18:02

user2154729

Prev 1 2 3

…

95 96 Next