Ideal robots.txt for a gitweb installation?

Question

I host a few git repositories at git.nomeata.de using gitweb (and gitolite). Occasionally, a search engine spider comes along and begins to hammer the interface. While I generally do want my git repositories to show up in search engines, I do not want to block them completely. But they should not invoke expensive operations such as snapshotting the archive, searching or generating diffs.

What is the “best” robots.txt file for such an installation?

score 2 · Answer 1 · edited Mar 14 '15 at 15:56

2

I guess this makes a good community wiki. Please extend this robots.txt if you think it can be improved:

User-agent: * 
Disallow: /*a=search*
Disallow: /*/search/*
Disallow: /*a=blobdiff*
Disallow: /*/blobdiff/*
Disallow: /*a=commitdiff*
Disallow: /*/commitdiff/*
Disallow: /*a=snapshot*
Disallow: /*/snapshot/*
Disallow: /*a=blame*
Disallow: /*/blame/*

edited Mar 14 '15 at 15:56

iustin

193
6

answered May 10 '13 at 08:29

Joachim Breitner

3,779
3
18
21

I've found that depending on how gitweb is configured, these URLs are not enough; e.g. in my installation, I needed to add for each `/.*a=xxx*` an entry of the type `/*/xxx/*`. – iustin Mar 14 '15 at 12:43

Ideal robots.txt for a gitweb installation?

1 Answers1