0

I'm using Aegir/Barracuda/Nginx to maintain a multisite setup. My "files" directory is symlinked to a mounted "files" directory. Therefore when I clone a site to be used for dev purposes it uses the same "files" directory. The problem with the current practice of using sites/mydomain/files as the location for the robots.txt is that I am unable to put custom directions in my new cloned development site to stop the crawlers form indexing and thus getting penalized for duplicate content. Is there a workaround option for me?

My files directory pretty much has to be symlinked because it holds a LOT of media files and it wouldn't make sense to recreate the entire "files" directory every time I clone a site.

Meggy
  • 1,491
  • 3
  • 28
  • 63

1 Answers1

0

After giving this some thought, you don’t even have to let Drupal Aegir handle the request. It’s a plain text file, no need for Drupal Aegir to bootstrap at all. Let nginx handle the request directly

server {
    server_name server2;
    root /var/server2;

    # Tell nginx that a request to the robots.txt in the files directory should
    # be matched against the robots.txt in our document root.
    location = /files/robots.txt {
        alias $document_root/robots.txt;
    }

    # Directly deliver the robots.txt, no need to bootstrap Drupal Aegir.
    location = /robots.txt {
        try_files $uri =404;
    }

    location / {
        # your normal stuff
    }
}
Fleshgrinder
  • 15,703
  • 4
  • 47
  • 56
  • ? I'm not clear. All the sites including the cloned site are on one and the same server... ? All cloned sites share the same "files" folder. In other words. The cloned sites share the media files. The cloned sites are for the purposes of development while the main site is for production. That is why I don't want the cloned sites to be crawled by the search engines. The maintainers of Aegir/provision have decided that the location of robots.txt should be in sites/mydomain/files for multisite installations. But as my files directory is shared by all sites, this doesn't work for me. – Meggy Jun 24 '13 at 23:21
  • I extended my answer to be more verbose. – Fleshgrinder Jun 25 '13 at 09:22
  • Thanks! Does this code go into /etc/nginx/nginx.conf or /var/aegir/.drush/provision/http/Provision/Service/http/nginx.conf? – Meggy Jun 29 '13 at 21:23
  • Also is "# your normal stuff" the stuff that your code has replaced? – Meggy Jun 29 '13 at 21:28
  • Doh! I've just realised how foolish I've been. (very embarrassed) If I simply clone all development sites to a different platform and domain name, I can bypass having the robots.txt in domain/files altogether and just leave it in the platform root. I will mark your answer correct anyway because you did answer my question as I had originally asked it. – Meggy Jun 29 '13 at 22:04