I'm trying to build a search engine for my final year project. I have done lots of research on this topic in the last 2 months. And I found that I will need a crawler to crawl the Internet, a parser, and an indexer.
I am trying to use Nutch as crawler and solr to index data crawled by Nutch. But I am stuck in the installation part of both of them. I am trying to install Nutch and solr in my system with the help of tutorials on the internet, but nothing worked for me.
I need some kind of installation guide or a link where I can learn how to install and integrate Nutch and solr.
Next I am stuck with the parser. I have no idea about this phase. I need help here on how to do the parsing of data before indexing.
I don't want to build Google or something. All I need is certain items from certain websites to be searched.
I have Java experience and I can work with it comfortably but I am not a professional like you guys, and please do tell me whether I am going in the right direction or not, and what I should do next.
I am using Ubuntu 10.10, and I have Apache Tomcat 7.