2

I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information.

It seems to me that Xapian could be a good choice for the search engine library.

What are the options for a crawler/parser to integrate with Xapian?

Would Solr be a better choice than Xapian to integrate with open source crawlers/parsers?

Enrico Detoma
  • 3,159
  • 3
  • 37
  • 53

2 Answers2

2

Here's a little comparison between Xapian and Solr.

But if you want to build a crawler, take a look at Nutch. It's extensible with plugins, so you could write a plugin that analyzes the information that you're looking for.

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
2

Flax may provide some of what you're looking for.

mmmmmm
  • 32,227
  • 27
  • 88
  • 117
Rob Young
  • 1,235
  • 11
  • 19