Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions
-2
votes
1 answer

how to stop users accessing robots.txt file in the website?

I need to stop users accessing robots.txt file in my website. I am not sure if i add robots.txt to 301 redirection in htaccess, google may discard the robots.txt, so Please advise me about this.
-2
votes
1 answer

How can i block bot from crawling user generated url?

I am working on an eCommerce site and it is creating URLs after user searches in a search bar or, filter products. Like if a user searches for an apple the website will have a URL like example.com/s=apple? or something like that and the same for if…
-2
votes
1 answer

How do I block TinEye spider?

I know how to block Google image search through robots.txt, is there something similar for TinEye? I do not want my sites being indexed by them.
-2
votes
1 answer

How can I stop x-robots-tag from setting noindex on my entire site?

I have an up to date WordPress site, running WooCommerce and Yoast that has the following noindex set in the headers on every single page. x-robots-tag: noindex, nofollow, nosnippet, noarchive I'm not sure where it's coming from. The only…
Twentyonehundred
  • 2,199
  • 2
  • 17
  • 28
-2
votes
2 answers

How to block a certain type of urls on robots.txt or .htaccess?

Currently on my webshop, on the category pages with too many pages, the urls end by https://www.example.com?p=2 p=3... I want to tell to the robots.txt to not index urls ending by p=Number. How do I do this? It's a prestashop website by the way.…
JohnDickinson
  • 97
  • 1
  • 3
  • 11
-2
votes
1 answer

robots.txt format to disallow all sub URLsbut not the root URL itself

My application urls are like the following http://example.com/app/1 http://example.com/app/2 http://example.com/app/3 ... http://example.com/app/n Now I want to block all these URLs from crawling but not the http://example.com/app How can I do this…
user8073054
-2
votes
1 answer

Google indexation

I have a newbie question about Google indexation. We created a web app on a LAMP stack, where the app and a wordpress landing page share the same server. My question is: When indexing the landing page, will Google robots go through the login link…
AB Moslih
  • 45
  • 1
  • 9
-2
votes
1 answer

Link Juice Prioritize www. and domain name

Our company ran a site audit on a very basic one-page website and is receiving this warning. "Normally, a webpage can be accessed with or without adding www.to its domain name. If you haven't specified which version should be prioritized,…
Benjamin
  • 697
  • 1
  • 8
  • 32
-2
votes
1 answer

Allow and Disallow in Robots.txt

http://www.robotstxt.org/orig.html says: Disallow: /help disallows both /help.html and /help/index.html Now, google.com/robots.txt lists: Disallow: /search Allow: /search/about Upon running robotparser.py, it returns false for both the…
Romy
  • 11
  • 2
-2
votes
2 answers

Robots.txt Error on Google Search Console

While I am submitting my website robots.txt to Google Search Console it's showing as an error like the below screenshot.
-2
votes
1 answer

My robots.txt file is not reflecting the new content

I have uploaded a new robots.txt file on my cpanel in the public_html directory, but when I browse it through url like www.mydomian.com/robots.txt, its not showing the changes and not refreshing the new content, I have rechecked everything, I can…
-2
votes
1 answer

How to use robots.txt to disallow to some controller in Codeigniter

I'm quite new to the topic robots.txt. I had look about it for few hours and tried tom implement it. I have controller named login and view. All I want is Google search to list my view controller only not the login controller. But now when I search…
Homnath Bagale
  • 464
  • 1
  • 6
  • 32
-2
votes
3 answers

SEO Help with Pages Indexed by Google

I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all. Let's take a look at this page, for example:…
Joe Majewski
  • 1,611
  • 3
  • 18
  • 31
-2
votes
2 answers

how to set Robots.txt files for subdomains?

I have a subdomain eg blog.example.com and i want this domain not to index by Google or any other search engine. I put my robots.txt file in 'blog' folder in the server with following configuration: User-agent: * Disallow: / Would it be fine to…
-2
votes
1 answer

Reading the content of robots.txt in Python and printing it

I want to check if a given website contains robot.txt, read all the content of that file and print it. Maybe also add the content to a dictionary would be very good. I've tried playing with the robotparser module but can't figure out how to do it.…
The One Electronic
  • 129
  • 1
  • 1
  • 9