I'm wondering if Berkeley DB JE is a suitable choice to store a simple key/value pair for 100M of documents.
I need to achieve <75ms at fetching time on BDB, fetching one document.
Thanks in advance
I'm wondering if Berkeley DB JE is a suitable choice to store a simple key/value pair for 100M of documents.
I need to achieve <75ms at fetching time on BDB, fetching one document.
Thanks in advance
Why not use Apache Lucene - an open source Information Retrieval engine? I would use lucene to keep an index: keywords to documents ids. You can now post a keyword (or a set of keywords) to lucene, get an id of document, and retrieve the document from Berkley DB.
You may want to discuss your performance requirements on the Berkeley DB Java Edition discussion forum. The main question is going to end up being "How many I/Os do you need to perform in order to get to the data?" If the answer is "none", then 75 ms response time is a piece of cake. If the answer is "many" then it will depend on how many "many" is and the speed of your disk drive.
There are some excellent quick references on the BDB JE FAQ page. In particular, this one may be of immediate use. Basically, you want to size your cache so at least all of the Index Nodes fit in memory. If the Index Nodes fit in memory, then you'll have to do at most one I/O to get to the data (Leaf Node) unless it's already in the cache.