Questions tagged [robots.txt]

Robots.txt (the Robots Exclusion Protocol) is a text file placed in the root of a web site domain to give instructions to compliant web robots (such as search engine crawlers) about what pages to crawl and not crawl, as well as other information such as a Sitemap location. In modern frameworks it can be useful to programmatically generate the file. General questions about Search Engine Optimization are more appropriate on the Webmasters StackExchange site.

Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a website URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use, so don't try to use /robots.txt to hide information.

More information can be found at http://www.robotstxt.org/.

1426 questions
-1
votes
1 answer

Disallow search link on a website with robots txt

I want to disallow the search link on my site in the robots.txt. After I click the search submit button, my URL will look like: example.com/searching?k=something How can I write this URL address into my robots.txt file? My robots.txt file looks…
Parkolo11
  • 93
  • 3
  • 13
-1
votes
2 answers

Robots.txt disallow by regex

On my website I have a page for the cart, that is: http://www.example.com/cart and another for the cartoons: http://www.example.com/cartoons. How should I write on my robots.txt file to ignore only the cart page? The cart page does not accept an…
-1
votes
1 answer

Preventing search engines from indexing all posts

I'm working on a Wordpress site where I'm using the posts to create a list of tour dates for an entertainer. With ACF I have fields set up in a table and the client just enters a date, location, link to buy tickets, etc. The table is all I need…
-1
votes
1 answer

Will altering the robots.txt file require an iis restart?

I am about to amend the robots txt file for a site, however I am curious if once the amendment is made I will need to restart iis for it fully take effect? Thanks in advance
markabarmi
  • 245
  • 1
  • 14
-1
votes
1 answer

Robots.txt is not accessible for Opencart website

When you enter example.com/robots.txt, my opencart website giving me 404 error like It should work like https://www.daraz.pk/robots.txt Please help.
Zeshan
  • 67
  • 6
-1
votes
2 answers

How to let robots parse our custom html elements?

I have a website containing custom elements (i use angular 2), and google fails to parse them correctly : It only sees It seems that the value of this component is not retrieved at all by google robots. Is there a best…
matth3o
  • 3,229
  • 3
  • 20
  • 24
-1
votes
1 answer

Fetch as Google - Status partial

I used google webmaster tool to "Fetch as Google", but I receive the response "partial" because some resources are blocked. when I check the result, I can read that Googlebot couldn't get all resources for this page. I have a list with all the…
mattia
  • 591
  • 2
  • 7
  • 22
-1
votes
1 answer

Making my website mobile friendly - Google mobile friendly test

I've been busy building a mobile friendly version of my website at http://mobilereactor.co.uk/ so I thought i'd test it at https://www.google.co.uk/webmasters/tools/mobile-friendly/ My website is now mobile friendly as far as I can see on my mobile…
mitchelangelo
  • 851
  • 4
  • 16
  • 42
-1
votes
1 answer

Disallow certain URLs in robot.txt

I'm currently running a web service where people can browse products. The URL for that is basically just /products/product_pk/. However, we don't serve products with certain product_pks, e.g. nothing smaller than 200. Is there hence a way to…
pasql
  • 3,815
  • 5
  • 23
  • 33
-1
votes
1 answer

How to ignore some links in my website?

I am working on a small php script and i have some links like this *-phones-*.html * are variables i want to disallow google to index this kind of links using robots.txt, it is possible ?
-1
votes
1 answer

Excluding one site on a shared VPS

I have a server with multiple websites, of which I want to block only one. I know that robots.txt accepts the following: User-agent: * Disallow: / To block bots from crawling the site, but there is ambiguous language in the articles I read. Some…
symlink
  • 11,984
  • 7
  • 29
  • 50
-1
votes
1 answer

Domain & Sub domain - On Page Optimization

I have website Main domain www.domain.com Sub Domains www.movie.tollywood.domain.com www.movie.hollywood.domain.com www.songs.hollywood.domain.com I don't want to crawl and index the subdomains in all search engine.
Rakesh
  • 1
-1
votes
1 answer

How to allow multiple robot in robot.txt

User-Agent:* Disallow:/ User-Agent: Googlebot User-Agent: Googlebot-Mobile User-Agent: Googlebot-Image User-Agent: Bingbot Allow:/ Does this allow all 4 bots or just allow Bingbot and disallow everything? Edit My Answer could be useful for people…
Jigong Bagong
  • 65
  • 1
  • 10
-1
votes
1 answer

Couple of questions about robots and content blocking

I'm configuring the robots.txt file for robots, and can't really understand what dirs I should block from them. Of course, I've read some infos at the internet, but yet there's some gap between what I want to know and what I've been found so far.…
dotzzy
  • 5
  • 2
-1
votes
1 answer

How do I setup a robots.txt which allows all pages EXCEPT the main page?

If I have a site called http://example.com, and under it I have articles, such as: http://example.com/articles/norwegian-statoil-ceo-resigns Basically, I don't want the text from the frontpage to show on Google results, so that when you search for…