Storing and searching 4+ million documents

Question

I am expected to implement a storage and search solution for large dataset which has more than 4 million of documents. Each document will have 40 or more fields (or search criteria)

I have worked with Lucene and Solr before, so I tend to use them for this problem (any other ideas and solutions are welcomed of course). But the thing bugs me is the efficient and scalable storage. I have been looking around for Cassandra and MongoDB and some other NoSQL solutions but couldn't be sure which technology could be the best for the requirement.

I would like to ask if anyone has ever faced a similar issue and what she/he used to solve it..

recommendation questions are off topic here, and in stackexchange sites in general — , Aug 03 '12 at 02:54
This is an open question, without anything specific being asked. check the guidelines. — DallaRosa, Aug 03 '12 at 03:10

score 1 · Accepted Answer · edited Apr 27 '22 at 11:04

1

Check this survey paper for general reference:

Survey of Document Oriented Datastores, some metrics available
http://cattell.net/datastores/Datastores.pdf

For IEEE subscribers:

NoSQL evaluation: A use case oriented survey
http://www.computer.org/portal/web/csdl/doi/10.1109/CSC.2011.6138544
Link

edited Apr 27 '22 at 11:04

Glorfindel

21,988
13
81
109

answered Aug 03 '12 at 02:59

Edmon

4,752
4
32
42

Storing and searching 4+ million documents

1 Answers1