0

I'm trying to build a search engine for my final year project. I have done lots of research on this topic in the last 2 months. And I found that I will need a crawler to crawl the Internet, a parser, and an indexer.

I am trying to use Nutch as crawler and solr to index data crawled by Nutch. But I am stuck in the installation part of both of them. I am trying to install Nutch and solr in my system with the help of tutorials on the internet, but nothing worked for me.

I need some kind of installation guide or a link where I can learn how to install and integrate Nutch and solr.

Next I am stuck with the parser. I have no idea about this phase. I need help here on how to do the parsing of data before indexing.

I don't want to build Google or something. All I need is certain items from certain websites to be searched.

I have Java experience and I can work with it comfortably but I am not a professional like you guys, and please do tell me whether I am going in the right direction or not, and what I should do next.

I am using Ubuntu 10.10, and I have Apache Tomcat 7.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
Nipun David
  • 25
  • 1
  • 4

1 Answers1

-1

This is for nutch installation and this is for integration with Solr.

Regarding the parsers, nutch has its own set of parsers and you dont have to bother about parsing. Trigger the crawl command, its done automatically. Unless you want to parse things apart from those provided by nutch, it wont be an issue for you. If you want nutch to parse some .xyz files, then you nedd to write parser plugins for that and integrate with nutch.

Tejas Patil
  • 6,149
  • 1
  • 23
  • 38