Basic requirements:
- Should be able to index things like MediaWiki, Confluence, Sharepoint, GitHub:Enterprise, Askbot
- Should be reasonably smart about de-duping results (one reason Confluence search is so painful).
- Should definitely incorporate heuristics like how many pages link to a document, whether the search terms are in the title of the document, etc. If there's a way for users to downrank particular results, that might be a bonus.
- Should be somewhat tunable (e.g., prefer Confluence over Sharepoint, blacklist certain paths).
Are there off-the-shelf products that can do the above? FOSS projects? Are there FOSS projects that can provide the basics for the above and are easy to extend or build a frontend for?