I am expected to implement a storage and search solution for large dataset which has more than 4 million of documents. Each document will have 40 or more fields (or search criteria)
I have worked with Lucene and Solr before, so I tend to use them for this problem (any other ideas and solutions are welcomed of course). But the thing bugs me is the efficient and scalable storage. I have been looking around for Cassandra and MongoDB and some other NoSQL solutions but couldn't be sure which technology could be the best for the requirement.
I would like to ask if anyone has ever faced a similar issue and what she/he used to solve it..