Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions

-2

votes

1 answer

how to stop users accessing robots.txt file in the website?

I need to stop users accessing robots.txt file in my website. I am not sure if i add robots.txt to 301 redirection in htaccess, google may discard the robots.txt, so Please advise me about this.

robots.txt

asked Sep 14 '11 at 15:33

user944963

-2

votes

1 answer

How can i block bot from crawling user generated url?

I am working on an eCommerce site and it is creating URLs after user searches in a search bar or, filter products. Like if a user searches for an apple the website will have a URL like example.com/s=apple? or something like that and the same for if…

web-crawler seo robots.txt

asked Sep 07 '22 at 15:36

Thewhiteshadow

-2

votes

1 answer

How do I block TinEye spider?

I know how to block Google image search through robots.txt, is there something similar for TinEye? I do not want my sites being indexed by them.

block robots.txt privacy google-image-search

asked Apr 30 '21 at 16:02

waddleworks

-2

votes

1 answer

How can I stop x-robots-tag from setting noindex on my entire site?

I have an up to date WordPress site, running WooCommerce and Yoast that has the following noindex set in the headers on every single page. x-robots-tag: noindex, nofollow, nosnippet, noarchive I'm not sure where it's coming from. The only…

wordpress seo robots.txt yoast

asked Jan 29 '19 at 17:24

Twentyonehundred

2,199
2
17
28

-2

votes

2 answers

How to block a certain type of urls on robots.txt or .htaccess?

Currently on my webshop, on the category pages with too many pages, the urls end by https://www.example.com?p=2 p=3... I want to tell to the robots.txt to not index urls ending by p=Number. How do I do this? It's a prestashop website by the way.…

.htaccess seo robots.txt

asked Jul 26 '18 at 16:36

JohnDickinson

-2

votes

1 answer

robots.txt format to disallow all sub URLsbut not the root URL itself

My application urls are like the following http://example.com/app/1 http://example.com/app/2 http://example.com/app/3 ... http://example.com/app/n Now I want to block all these URLs from crawling but not the http://example.com/app How can I do this…

seo robots.txt

asked Sep 24 '17 at 11:52

user8073054

-2

votes

1 answer

Google indexation

I have a newbie question about Google indexation. We created a web app on a LAMP stack, where the app and a wordpress landing page share the same server. My question is: When indexing the landing page, will Google robots go through the login link…

wordpress indexing seo lamp robots.txt

asked Nov 15 '16 at 01:23

AB Moslih

-2

votes

1 answer

Link Juice Prioritize www. and domain name

Our company ran a site audit on a very basic one-page website and is receiving this warning. "Normally, a webpage can be accessed with or without adding www.to its domain name. If you haven't specified which version should be prioritized,…

seo robots.txt no-www

asked Jul 01 '16 at 20:33

Benjamin

-2

votes

1 answer

Allow and Disallow in Robots.txt

http://www.robotstxt.org/orig.html says: Disallow: /help disallows both /help.html and /help/index.html Now, google.com/robots.txt lists: Disallow: /search Allow: /search/about Upon running robotparser.py, it returns false for both the…

robots.txt

asked Apr 11 '16 at 11:32

Romy

-2

votes

2 answers

Robots.txt Error on Google Search Console

While I am submitting my website robots.txt to Google Search Console it's showing as an error like the below screenshot.

seo robots.txt google-search-console

asked Jan 27 '16 at 23:23

LAXMAN KOTTE

-2

votes

1 answer

My robots.txt file is not reflecting the new content

I have uploaded a new robots.txt file on my cpanel in the public_html directory, but when I browse it through url like www.mydomian.com/robots.txt, its not showing the changes and not refreshing the new content, I have rechecked everything, I can…

wordpress robots.txt

asked Jan 03 '16 at 16:56

Sky Life

-2

votes

1 answer

How to use robots.txt to disallow to some controller in Codeigniter

I'm quite new to the topic robots.txt. I had look about it for few hours and tried tom implement it. I have controller named login and view. All I want is Google search to list my view controller only not the login controller. But now when I search…

php html codeigniter seo robots.txt

asked Jul 06 '15 at 10:52

Homnath Bagale

-2

votes

3 answers

SEO Help with Pages Indexed by Google

I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all. Let's take a look at this page, for example:…

seo indexing robots.txt noindex

asked Jun 12 '10 at 17:22

Joe Majewski

1,611
3
18
31

-2

votes

2 answers

how to set Robots.txt files for subdomains?

I have a subdomain eg blog.example.com and i want this domain not to index by Google or any other search engine. I put my robots.txt file in 'blog' folder in the server with following configuration: User-agent: * Disallow: / Would it be fine to…

robots.txt subdomain

asked Sep 11 '14 at 05:56

Rahul Vicky

-2

votes

1 answer

Reading the content of robots.txt in Python and printing it

I want to check if a given website contains robot.txt, read all the content of that file and print it. Maybe also add the content to a dictionary would be very good. I've tried playing with the robotparser module but can't figure out how to do it.…

python html python-2.7 urllib2 robots.txt

asked Jul 19 '14 at 09:58

The One Electronic

Prev 1 2 3

…

95 96 Next