Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

11 answers

Meta tag vs robots.txt

Is it better to use meta tags* or the robots.txt file for informing spiders/crawlers to include or exclude a page? Are there any issues in using both the meta tags and the robots.txt? *Eg: <#META name="robots" content="index, follow">

seo robots.txt meta-tags

asked Jul 27 '10 at 21:39

keruilin

16,782
34
108
175

votes

4 answers

How to add `nofollow, noindex` all pages in robots.txt?

I want to add nofollow and noindex to my site whilst it's being built. The client has request I use these rules. I am aware of But I only have access to the robots.txt file. Does anyone know the…

http robots.txt access-control

asked Aug 21 '17 at 05:44

MeltingDog

14,310
43
165
295

votes

4 answers

Stopping index of Github pages

I have a github page from my repository username.github.io However I do not want Google to crawl my website and absolutely do not want it to show up on search results. Will just using robots.txt in github pages work? I know there are…

robots.txt github-pages

asked Sep 25 '15 at 14:21

user2961712

votes

5 answers

Robots.txt: allow only major SE

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and MSN spiders?

web-crawler robots.txt

asked Mar 22 '09 at 19:30

vyger

votes

2 answers

robots.txt file for different domains of same site

I have an ASP.NET MVC 4 web application that can be accessed from multiple different domains. The site is fully localized based on the domain in the request (similar in concept to this question). I want to include a robots.txt file and I want to…

asp.net-mvc-4 seo robots.txt

asked Jun 10 '13 at 22:21

amateur

43,371
65
192
320

votes

3 answers

Serving sitemap.xml and robots.txt with Spring MVC

What is the best way to server sitemap.xml and robots.txt with Spring MVC? I want server these files through Controller in cleanest way.

java spring spring-mvc sitemap robots.txt

asked Sep 05 '12 at 20:26

michal.kreuzman

12,170
10
58
70

votes

5 answers

How to set up a robot.txt which only allows the default page of a site

Say I have a site on http://example.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words http://example.com & http://example.com/ should be allowed, but…

web-crawler bots robots.txt googlebot slurp

asked Sep 04 '08 at 09:51

Boaz

25,331
21
69
77

votes

3 answers

Does robots.txt apply to subdomains?

Let's say I have a test folder (test.domain.com) and I don't want the search engines to crawl in it, do I need to have a robots.txt in the test folder or can I just place a robots.txt in the root, then just disallow the test folder?

robots.txt

asked Nov 28 '13 at 01:12

Pa3k.m

votes

6 answers

How do i configure nginx to redirect to a url for robots.txt & sitemap.xml

I am running nginx 0.6.32 as a proxy front-end for couchdb. I have my robots.txt in the database, reachable as http://www.example.com/prod/_design/mydesign/robots.txt. I also have my sitemap.xml which is dynamically generated, on a similar url. I…

nginx robots.txt

asked Jul 07 '09 at 07:25

timbo

13,244
8
51
71

votes

3 answers

Robots.txt Allow sub folder but not the parent

Can anybody please explain the correct robots.txt command for the following scenario. I would like to allow access to: /directory/subdirectory/.. But I would also like to restrict access to /directory/ not withstanding the above exception.

robots.txt

asked Sep 30 '11 at 10:24

QFDev

8,668
14
58
85

votes

3 answers

Angular2 + webpack do not deploy robots.txt

I am creating a web site with Angular2@2.1.2. I am using Webpack with default settings (as a dependency). Here is my package.json "dependencies": { "@angular/common": "2.1.2", "@angular/compiler": "2.1.2", "@angular/core": "2.1.2", "@angular/forms":…

angular webpack robots.txt

asked Jan 13 '17 at 06:03

Guymage

1,524
1
14
21

votes

2 answers

Ruby on Rails robots.txt folders

I'm about to launch a Ruby on Rails application and as the last task, I want to set the robots.txt file. I couldn't find information about how the paths should be written properly for a Rails application. Is the starting path always the root path…

ruby-on-rails robots.txt

asked Sep 09 '13 at 07:55

Linus

4,643
8
49
74

votes

1 answer

Robots.txt - What is the proper format for a Crawl Delay for multiple user agents?

Below is a sample robots.txt file to Allow multiple user agents with multiple crawl delays for each user agent. The Crawl-delay values are for illustration purposes and will be different in a real robots.txt file. I have searched all over the web…

format web-crawler robots.txt agents

asked Jun 29 '13 at 07:28

Sammy

votes

5 answers

how to prevent staging to be indexed in search engines

I would like my staging web sites to no being indexed by search engines (Google as first). I have heard Wordpress is good at doing this but I would like to be technology agnostic. Does the robots.txt is enough ? We would like to keep anonymous…

web search-engine robots.txt nofollow

asked Aug 30 '12 at 13:27

toutpt

5,145
5
38
45

votes

4 answers

How to stop search engines from crawling the whole website?

I want to stop search engines from crawling my whole website. I have a web application for members of a company to use. This is hosted on a web server so that the employees of the company can access it. No one else (the public) would need it or…

security .htaccess robots.txt

asked Feb 01 '12 at 20:34

Iain Simpson

8,011
13
47
66

Prev 1

…

95 96 Next