I am interested in running Lucene.NET for an application that runs in Windows clusters. The search problem itself is reasonably small, but the stateless/cluster problem still has to be handled.
I understand that SOLR handles my scenario (and more) but requiring a servlet container (and Java) poses some problems for me. Depending on the complexity of a Lucene.NET based approach it may still be a vial option, though.
My question now is what options I have for handling the problem of running on multiple hosts:
Persist on a shared storage, common for all nodes? Would Lucene.NET handle concurrency transparently? Would servers use RAM for caching, and if so does Lucene.NET handle invalidation of this based on updated files transparently?
Replication? Each server has its own copy of everything it needs. On any update, all servers get a new replica (or diff if this is reasonably simple). Existing tools for this, or up to me to handle?
Workload partitioning/sharding? Each server handles only its own data, both for reads and updates. Tools for handling this, joining partial results etc?
Other options I may have missed in my initial investigation?
When experimenting with a local version, my Lucene directory was in the order of a couple hundred megs. Longer-term I can see 1-5 GB perhaps. If the frequency of updates is a difficulty I can control this fairly flexibly. Concurrent read/search loads are expected to be very moderate.