Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

votes

1 answer

Is this robots.txt syntax with an empty "Disallow:" correct?

Today whilst improving my web crawler to support the robots.txt standard, I came across the following code at http://www.w3schools.com/robots.txt User-agent: Mediapartners-Google Disallow: Is this syntax correct? Shouldn't it be Disallow: / or…

robots.txt

asked Apr 04 '16 at 14:51

dangee1705

3,445
1
21
40

votes

4 answers

Is it possible to control the crawl speed by robots.txt?

We can tell bots to crawl or not to crawl our website in robot.txt. On the other hand, we can control the crawling speed in Google Webmasters (how much Google bot crawls the website). I wonder if it is possible to limit the crawler activities by…

search-engine robots.txt google-crawlers

asked Oct 16 '11 at 20:56

Googlebot

15,159
44
133
229

votes

2 answers

HTTP header to detect a preload request by Google Chrome

Google Chrome 17 introduced a new feature which preloads a webpage to improve rendering speed upon actually making the request (hitting enter in the omnibar). Two questions: Is there a HTTP header to detect such a request on server side, and if one…

google-chrome http-headers meta-tags robots.txt

asked Mar 24 '12 at 13:48

oxygen

5,891
6
37
69

votes

2 answers

How can I serve robots.txt on an SPA using React with Firebase hosting?

I have an SPA built using create-react-app and wish to have a robots.txt like this: http://example.com/robots.txt I see on this page that: You need to make sure your server is configured to catch any URL after it's configured to serve from a…

reactjs firebase single-page-application robots.txt create-react-app

asked May 08 '18 at 18:20

WilliamKF

41,123
68
193
295

votes

3 answers

How do I prevent Bing from swamping my site with traffic irregularly?

Bingbot will hit my site pretty hard for a couple of hours each day, and will be extremely light for the rest of the time. I'd either like to smooth out its crawls, reduce its rate limit, or block it altogether. It doesn't really send through any…

web-crawler robots.txt bing bingbot

asked Feb 08 '11 at 01:09

Tim Haines

1,496
3
14
16

votes

1 answer

Robots.txt syntax not understood

I submitted my robots.txt file ages ago to Google and it is still giving me a syntax not understood for the first line. After Googling the most common problem is Google adding a '?' at the start of the line but it isnt doing that to me. the url to…

robots.txt google-search-console

asked Jul 10 '12 at 19:10

Lex Eichner

1,056
3
10
35

votes

5 answers

robots.txt in subdirectory

I have a project that lies in a folder below the main domain, and I dont have access to the root of the domain itself. http://mydomain.com/myproject/ I want to disallow indexing on the subfolder…

asp.net seo robots.txt

asked Jan 29 '11 at 14:16

magnattic

12,638
13
62
115

votes

2 answers

Web Crawler - Ignore Robots.txt file?

Some servers have a robots.txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore the robots.txt file? I am using Mechanize for python.

python web-crawler mechanize robots.txt

asked Dec 05 '11 at 14:05

Craig Locke

votes

1 answer

Generating a dynamic /robots.txt file in a Next.js app

I need a way to answer dynamically to the /robots.txt request. And that's why I've decided to go with getServerSideProps https://nextjs.org/docs/basic-features/data-fetching#getserversideprops-server-side-rendering If you export an async function…

reactjs next.js robots.txt

asked May 04 '21 at 15:13

cbdeveloper

27,898
37
155
336

votes

4 answers

Disallow or Noindex on Subdomain with robots.txt

I have dev.example.com and www.example.com hosted on different subdomains. I want crawlers to drop all records of the dev subdomain but keep them on www. I am using git to store the code for both, so ideally I'd like both sites to use the same…

robots.txt

asked Feb 05 '11 at 01:58

Kirk Ouimet

27,280
43
127
177

votes

5 answers

Facebook and Crawl-delay in Robots.txt?

Does Facebook's webcrawling bots respect the Crawl-delay: directive in robots.txt files?

facebook seo robots.txt

asked Oct 10 '11 at 17:37

artlung

33,305
16
69
121

votes

2 answers

Robots.txt, how to allow access only to domain root, and no deeper?

I want to allow crawlers to access my domain's root directory (i.e. the index.html file), but nothing deeper (i.e. no subdirectories). I do not want to have to list and deny every subdirectory individually within the robots.txt file. Currently I…

robots.txt

asked Mar 05 '11 at 20:32

WASa2

votes

6 answers

How to make a private URL?

I want to create a private url as http://domain.com/content.php?secret_token=XXXXX Then, only visitors who have the exact URL (e.g. received by email) can see the page. We check the $_GET['secret_token'] before displaying the content. My problem is…

php html web robots.txt

asked Feb 09 '12 at 18:12

Googlebot

15,159
44
133
229

votes

1 answer

Can I use the “Host” directive in robots.txt?

Searching for specific information on the robots.txt, I stumbled upon a Yandex help page‡ on this topic. It suggests that I could use the Host directive to tell crawlers my preferred mirror domain: User-Agent: * Disallow: /dir/ Host:…

seo robots.txt

asked Feb 25 '14 at 10:38

dakab

5,379
9
43
67

votes

6 answers

Rendering plain text through PHP

For some reason, I want to serve my robots.txt via a PHP script. I have setup apache so that the robots.txt file request (infact all file requests) come to a single PHP script. The code I am using to render robots.txt is: echo "User-agent:…

php text header robots.txt plaintext

asked Dec 22 '10 at 06:20

JP19

Prev 1 2

…

95 96 Next