9

I am looking for a very robust software search engine to integrate in a .Net web site.

The current proposed solution is Lucene.NET a stack based on Lucene. However, I would like to evaluate other search engines before making my mind up.

The feature set we need is the following:

  • Ability to crawl arbitrary pages via HTTP
  • Ability to parse sitemaps
  • Ability to get lists of URIs to parse via a database look-up
  • Ability to restrict the search to a particular language/locale
  • Ability to restrict the search to a subset of the pages (e.g. via a regex on the URI)
  • Speed and scalability (this is for a public website with a ton of traffic)
  • Must have .NET API support or a super-easy http-based API that can be wrapped in a .NET API
  • Language-dependent full-text support

Other things which would be great, but not deal-breakers if they aren't supported:

  • Reporting
  • Aliasing and biasing of results
  • HTTP-based administration pages
  • SQL Server support

What other software search engines have worked for you? Is there any you would recommend or that we should avoid?

Sklivvz
  • 30,601
  • 24
  • 116
  • 172

7 Answers7

3

Lucene.Net is an information retrieval library, not a search engine. In particular it won't do any of:

  • Crawl web pages or parse sitemaps
  • Reporting
  • HTTP-based administration pages
  • SQL Server support (Lucene.Net uses its own simple but highly effective file format, and doesn't use SQL Server)

Although I'm a strong supporter or SQL and would recommend it as the full-text search component of a search engine, you will also need a crawler / html parser component in order to create a full functional search engine, and you are going to have to carefully design your Lucene.Net indexes to maximise the performance of the queries that you want (searching by language/locale)

Try looking at the Solr project which is a fully fledged search engine using Lucene - this might be more suited towards your needs.

Sklivvz
  • 30,601
  • 24
  • 116
  • 172
Justin
  • 84,773
  • 49
  • 224
  • 367
3

Check out Microsoft's Search Server Express, although the page looks screwed up at the moment so try this link.

There's other enterprise engines out there such as vivisimo velocity (very extensible), autonomy, etc. Lucene and Solr are limited, hard to use and configure, but that's what you get when you want something free.

shawnwall
  • 4,549
  • 1
  • 27
  • 38
1

You may also have a look at OpenSearchServer

Runs like a charm on Windows. You can use the SOAP Web service to do the integration.

There is also a C# skeleton library working with the XML/REST API.

Disclaimer: I am the CEO of OpenSearchServer

Emmanuel Keller
  • 3,384
  • 1
  • 14
  • 16
  • 1
    Thanks for posting your answer! Please be sure to read the [FAQ on Self-Promotion](http://stackoverflow.com/faq#promotion) carefully. Also note that it is *required* that you post a disclaimer every time you link to your own site/product. – Andrew Barber Oct 17 '12 at 12:33
  • Hi Andrew. Thank you for your notice. After reading carefully the FAQ, especially the "May I promote products or websites I am affiliated with here?" topic, I was not able to find any detail about that kind of disclaimer. What is a good practice ? – Emmanuel Keller Oct 17 '12 at 15:27
  • 1
    Adjust this for whatever your appropriate role is, but something like: "Disclaimer: I am the Lead of the OpenSearchServer team", posted right after the link or any mention of it. Note that this isn't important when someone is asking a question *specifically about* how to do something on any of your products, "How do I sprug my sprockets with OpenSearchServer?" - you can answer that sort of "Help" question without disclosure. – Andrew Barber Oct 17 '12 at 15:50
  • @AndrewBarber, you could simply fix it yourself, as I did - this site is collaboratively edited :-) – Sklivvz Oct 17 '12 at 18:13
  • @Sklivvz I was mostly hoping to get the point across for the future. I also was not initially 100% sure he was affiliated with it (just mostly sure! hehe). But that's a good point nonetheless :-) – Andrew Barber Oct 17 '12 at 18:57
  • @AndrewBarber I will do that now. Three comments: 1.I am not a marketer. 2. The web site of OpenSearchServer is on my public profile since the origin. 3.The answer was relevant. ;-) – Emmanuel Keller Oct 18 '12 at 06:26
  • @EmmanuelKeller re #3, I know; If it wasn't relevant, I would have simply flagged as spam and forgotten about it. re #1, #2; both of those are fine, of course. But note that the requirement is still that you disclose in your answers when you mention/link to the product/website, since many readers don't look closely at profiles or even user names. Anyway; keep that in mind, and keep your answers relevant, and there won't be any problem at all :) – Andrew Barber Oct 18 '12 at 09:05
  • (Incidentally; I'm checking out the product myself for possible future use) – Andrew Barber Oct 18 '12 at 09:06
1

I'd recommend checking out Solr. It's Java-based, but meets the HTTP-based API leg of your requirements, is designed to run on a separate box/cluster from your primary app (so you don't necessarily need Java AND .NET on the same hardware), and it has a lot of momentum. It's been a while since I worked with it, but I don't remember it providing it's own crawler. If that's still the case, it should be straightforward to use a standalone crawler and the aforementioned API to make it work.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
1

Instead of using Lucene.Net directly, have you considered using something that wraps it and provides more functionality akin to what you're after?

Solr is an Apache product that does this, and there is also a .Net client port of for it. I've never used it in production, but it looks like the type of thing you're after.

Along a similar idea is Nutch (written by the guy who originally wrote Lucene), although I'm not aware of any .Net version of it. Nutch does have a spider component to crawl sites.

adrianbanks
  • 81,306
  • 22
  • 176
  • 206
1

Like others have said, definitley go with the original Lucene using Solr. Integrating it with .Net is super simple. I actually just recently blogged about this: http://crazorsharp.blogspot.com/2010/01/full-text-search-using-solr-lucene-and.html

BFree
  • 102,548
  • 21
  • 159
  • 201
1

Coveo is the search engine that we are currently putting in to replace a Google Mini that was used for a number of years. I'm just pointing these out as something to explore as I haven't used either enough to know how good they are. I just know of headaches with each, many many headaches.

JB King
  • 11,860
  • 4
  • 38
  • 49