0

I access SVN repositories via

  • http://svn.example.com/repo1
  • http://svn.example.com/repo2
  • ...

with the following Apache configuration

LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so

<VirtualHost xxx.xxx.xxx.xxx>
    ServerName svn.example.com

    <Location />
        DAV svn
        SVNParentPath /path/to/svn/repositories
        AuthzSVNAccessFile /path/to/svn/conf/auth_policy
        Satisfy Any

        AuthType Basic
        AuthName "Subversion repository"
        AuthUserFile /path/to/svn/conf/passwdfile
        Require valid-user
    </Location>
</VirtualHost>

I would like to prevent web crawlers from indexing the public repositories, but I cannot figure out how to properly set up the configuration to serve robots.txt from http://svn.example.com/robots.txt.

I have found a thread "stopping webcrawlers using robots.txt" from 2006, but it didn't help me solve the problem (Ryan's suggestion for redirection didn't work).

EDIT: I would prefer to keep the repositories at the top level rather than moving them to http://svn.example.com/something/reponame.

Mojca
  • 304
  • 1
  • 3
  • 14
  • It is up to the bot to read (or not) the bots file... do not use it for security – Lucas Jan 08 '14 at 02:29
  • Non-public repositories are password-protected, so it's not about security. Public repositories are ... well, public, but I don't want the results to appear in search engines. – Mojca Jan 08 '14 at 12:45

1 Answers1

0

Don't put your Subversion repositories' virtual directory in the root of your server:

Wrong

<Location />
    DAV svn
    SVNParentPath /path/to/svn/repositories

Right

<Location /svn>
    DAV svn
    SVNParentPath /path/to/svn/repositories

Instead of your repository root being http://svn.example.com, it will be http://svn.exmaple.com/svn. This frees up http://svn.example.com to be a true document root which means you can add some documentation about your site, and put in a robots.txt file under http://svn.example.com/robots.txt.

Now, a well behaved robot will see the robot.txt file and not index your Subversion repository.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • I'm aware that moving repositories one level down would solve the problem of `robots.txt`, but moving repositories to another URL is precisely something that I would like to avoid. I already have "svn" as a subdomain and introducing another "svn" seems superfluous, involves more typing + all the users would need to fix their checkouts. (In all honesty I would probably prefer spending time blacklisting IPs from different bots than moving repositories to a new location.) – Mojca Jan 09 '14 at 11:01
  • You **might** be able to do both a document root and an SVN repo at the same virtual root directory. Try defining the DocumentRoot and ServerRoot directives in your http.conf file before you define your `` for your SVN repositories. See if you can create an `index.html` that will display at the URL root. I know this was possible in older versions of Apache httpd. It's worth giving it a shot. Just be careful that you could end up with collisions if someone defines an HTML directory the same as a Subversion repo. – David W. Jan 09 '14 at 20:38
  • @DavidW.: It needs to be called `robots.txt` (not `robot.txt`). – unor Jan 11 '14 at 02:24