Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

2 answers

How to disable robots.txt when you launch scrapy shell?

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking…

asked Nov 26 '16 at 21:49

DARDAR SAAD

votes

7 answers

robots.txt; What encoding?

I am about to create a robots.txt file. I am using notepad. How should I save the file? UTF8, ANSI or what? Also, should it be a capital R? And in the file, I am specifying a sitemap location. Should this be with a capital S? User-agent: * …

search robots.txt

asked Sep 28 '10 at 20:30

user188962

votes

2 answers

Listing both sitemaps and sitemap index files in robots.txt?

My site is comprised of 3 main sections: Reviews, Forum, and Blog. I have plugins for the forum and blog that automatically generate sitemaps for them. The forum plugin generates a sitemap INDEX file pointing to multiple indexes, and the blog plugin…

sitemap robots.txt

asked Jun 15 '11 at 19:09

Chris

1,273
5
19
33

votes

2 answers

Robots.txt priority question

If I have these lines in robots.txt: Disallow /folder/ Allow /folder/filename.php Will the filename.php be allowed then? Which order does google prioritize the lines? What will happen here for example?: Allow / Disallow / I am mainly referring to…

php html search robots.txt

asked Jan 03 '11 at 23:42

user188962

votes

3 answers

Sitemap for a site with a large number of dynamic subdomains

I'm running a site which allows users to create subdomains. I'd like to submit these user subdomains to search engines via sitemaps. However, according to the sitemaps protocol (and Google Webmaster Tools), a single sitemap can include URLs from a…

search-engine sitemap robots.txt google-search-console

asked Oct 07 '10 at 10:11

bartekb

votes

2 answers

How to block search engines from indexing all urls beginning with origin.domainname.com

I have www.domainname.com, origin.domainname.com pointing to the same codebase. Is there a way, I can prevent all urls of basename origin.domainname.com from getting indexed. Is there some rule in robot.txt to do it. Both the urls are pointing to…

.htaccess url-rewriting robots.txt

asked Oct 05 '10 at 06:18

Loveleen Kaur

votes

2 answers

Should I use different case-spellings for case-insensitive directories in robots.txt?

Unfortunately, I’ve got case-insensitive servers that cannot be replaced in the short term. Some directories need to be excluded from crawling, so I have to Disallow them in my robots.txt. Let’s take /Img/ as example. If I keep it all lower…

web-crawler robots.txt case-sensitive

asked Feb 25 '14 at 11:20

dakab

5,379
9
43
67

votes

5 answers

Is it possible to list multiple user-agents in one line?

Is it possible in robots.txt to give one instruction to multiple bots without repeatedly having to mention it? Example: User-agent: googlebot yahoobot microsoftbot Disallow: /boringstuff/

user-agent robots.txt

asked Nov 29 '13 at 23:02

elhombre

2,839
7
28
28

votes

5 answers

How to allow crawlers access to index.php only, using robots.txt?

If i want to only allow crawlers to access index.php, will this work? User-agent: * Disallow: / Allow: /index.php

seo web-crawler robots.txt

asked Oct 28 '09 at 14:30

todd

votes

1 answer

How to configure robots.txt file to block all but 2 directories

I don't want any search search engines to index most of my website. I do however want search engines to index 2 folders ( and their children ). This is what I set up, but I don't think it works, I see pages in Google that I wanted to hide: Here's…

seo robots.txt google-search

asked Jun 23 '11 at 21:35

jeph perro

6,242
26
90
124

votes

2 answers

Why does Chrome request a robots.txt?

I have noticed in my logs that Chrome requested a robots.txt alongside everything I expected it to. [...] 2017-09-17 15:22:35 - (sanic)[INFO]: Goin' Fast @ http://0.0.0.0:8080 2017-09-17 15:22:35 - (sanic)[INFO]: Starting worker [26704] 2017-09-17…

google-chrome chromium robots.txt

asked Sep 17 '17 at 13:48

zython

1,176
4
22
50

votes

2 answers

Robots.txt file in MVC.NET 4

I have read an article about ignoring the robots from some url in my ASP MVC.NET project. In his article author said that we should add some action in some off controllers like this. In this example he adds the action to the Home Controller: #region…

asp.net asp.net-mvc-4 seo robots.txt

asked Jun 01 '15 at 16:29

Behzad Hassani

2,129
4
30
51

votes

1 answer

What does the dollar sign mean in robots.txt

I am curious about a website and want to do some web crawling at the /s path. Its robots.txt: User-Agent: * Allow: /$ Allow: /debug/ Allow: /qa/ Allow: /wiki/ Allow: /cgi-bin/loginpage Disallow: / My questions are: What does the dollar-sign mean…

web-crawler robots.txt

asked Apr 05 '15 at 08:17

夜一林风

1,247
1
13
24

votes

2 answers

Block bingbot from crawling my site

I would like t completely block bing from crawling my site for now (its attacking my site at an alarming rate (500GB of data a month). I have 1000 sub domains added to bing webmaster tools so i cant go and set each one's crawl rate. I have tried…

asp.net-mvc .htaccess bots robots.txt bing

asked Nov 28 '14 at 12:19

Zoinky

4,083
11
40
78

votes

1 answer

Nginx: different robots.txt for alternate domain

Summary I have a single web app with an internal and external domain pointing at it, and I want a robots.txt to block all access to the internal domain, but allow all access to the external domain. Problem Detail I have a simple Nginx server block…

django nginx dns robots.txt

asked Oct 10 '14 at 22:09

Joe J

9,985
16
68
100

Prev 1 2 3

…

95 96 Next