I am working on a project where I need to crawl through more than 10TB of data and index it. I need to implement incremental crawling that takes less time.
My question is : Which is the best tool suitable that all the big organizations are using for this along with java?
I was trying it out using Solr and Manifold CF but Manifold has very little documentation on the internet.