29

I am trying to set robots.txt for all virtual hosts under nginx http server. I was able to do it in Apache by putting the following in main httpd.conf:

<Location "/robots.txt">
    SetHandler None
</Location>
Alias /robots.txt /var/www/html/robots.txt

I tried doing something similar with nginx by adding the lines given below (a) within nginx.conf and (b) as include conf.d/robots.conf

location ^~ /robots.txt {
        alias /var/www/html/robots.txt;
}

I have tried with '=' and even put it in one of the virtual host to test it. Nothing seemed to work.

What am I missing here? Is there another way to achieve this?

masegaloeh
  • 18,236
  • 10
  • 57
  • 106
anup
  • 717
  • 4
  • 9
  • 19
  • 1
    Note: There was no way to put it as a Global setting (ie. set in one file that applies to all virtual hosts without an include statement). One can set a robots.conf in conf.d (or global.d [non-standard]) and include that in every virtual host config. All other answers points to various ways of doing the same thing viz: proxy_pass, retrun{} etc. – anup Aug 27 '18 at 10:53

5 Answers5

93

You can set the contents of the robots.txt file directly in the nginx config:

location = /robots.txt { return 200 "User-agent: *\nDisallow: /\n"; }

It is also possible to add the correct Content-Type:

location = /robots.txt {
   add_header Content-Type text/plain;
   return 200 "User-agent: *\nDisallow: /\n";
}
Vishal Singh
  • 103
  • 4
  • 2
    Just a note: I needed to put `location = /robots.txt` (Note the equals sign) otherwise another `location ~* \.(txt|log)$` match below it was overriding it. – Beebee Nov 28 '17 at 11:10
  • How this could be added to a tidy `conf.d/robots.conf`? As is _"location" directive is not allowed here_, which is reasonable, but it's not for a particular server. I'm not sure about @user79644 answer to this. Is inevitable to add this to each site? – Pablo A Feb 02 '18 at 17:18
  • I haven't tested this. But, looks similar to the one in question, except that a 'return' is used in place of alias. The issue I faced is to make it a global setting. Which means I shouldn't repeat it in every .conf of a website. I couldn't get the global method to work the way it works with Apache. Say for instance a Development server that shouldn't be crawled. – anup Aug 27 '18 at 10:44
13

Are there other rules that are defined? Maybe common.conf or another conf file in included which is over-riding your config. One of the following should definitely work.

location /robots.txt { alias /home/www/html/robots.txt; }
location /robots.txt { root /home/www/html/;  }
  1. Nginx runs all "regexp" locations in order of their appearance. If any "regexp" location succeeds, Nginx will use this first match. If no "regexp" location succeeded, Nginx uses the ordinary location found on the previous step.
  2. "regexp" locations have precedence over "prefix" locations
user79644
  • 616
  • 4
  • 3
  • It doesn't work as a global option. But, works within a virtualhost's config. I used the first one (location /robots.txt) and even the one I specified in question ('~* /robots.txt'). Both worked out of Virtual Host's config. I think the use of 'location' 'if {}' fall under 'server' directive and this, perhaps does not work at global level. – anup Oct 29 '13 at 11:17
  • Make sure you have a `/robots.txt` file to alias. I didn't get the `root` option to work. – Shadoath Jul 10 '17 at 18:35
7

location cannot be used inside http block. nginx does not have global aliases (i.e., aliases that can be defined for all vhosts). Save your global definations in a folder and include those.

server {
  listen 80;
  root /var/www/html;
  include /etc/nginx/global.d/*.conf;
}
user79644
  • 616
  • 4
  • 3
  • As given in the question I had tried doing so by putting robots.conf in conf.d folder. But it doesn't work as global. – anup Feb 07 '14 at 07:09
  • cont'd... Like you said, Nginx doesn't have global aliases. Eventually the resolution was to add it per virtual host config. – anup Feb 07 '14 at 07:27
2

You could also just serve the robots.txt it directly:

location /robots.txt {
   return 200 "User-agent: *\nDisallow: /\n"
}
Ben Bieler
  • 121
  • 2
-1

I had the same issue with the acme challanges, but the same principle applies to your case as well.

What I did to solve this issue was to move all my sites to a non-standard port, I picked 8081, and created a virtual server listening on port 80. It proxies all requests to 127.0.0.1:8081, except the ones to .well-known. This acts almost as a global alias, with one extra hop, but that shouldn't cause a significant drop in performance due to the async nature of nginx.

upstream nonacme {
  server 127.0.0.1:8081;
}

server {
  listen 80;

  access_log  /var/log/nginx/acme-access.log;
  error_log   /var/log/nginx/acme-error.log;

  location /.well-known {
    root /var/www/acme;
  }

  location / {
    proxy_set_header    Host                $http_host;
    proxy_set_header    X-Real-IP           $remote_addr;
    proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;
    proxy_set_header    X-Forwarded-Proto   $scheme;
    proxy_set_header    X-Frame-Options     SAMEORIGIN;

    # WebSocket support (nginx 1.4)
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";

    proxy_pass http://nonacme;
  }
}