We run Solr on an Amazon Web Services EC2 instance with a 1TB EBS volume to store the index so that we can easily launch additional servers with the same (read-only) index. However, our index is soon going to exceed 1TB, and I don't really want to deal with striping multiple EBS volumes to hold the index. Also, regenerating the index is very slow. I would like to move the index generation--and maybe hosting--to Hadoop, and preferably to Amazon's Elastic MapReduce, although I can set up separate Hadoop servers if need be. We use RightScale, so their library of ServerTemplates is available to us.
What would be the best place to get started using Lucene/Solr on Hadoop?