How to choose NoSQL database engine?

Question

We have a database with following parameters:

30k records, 7mb in size
20 inserts/second
1000 updates/second
1000 range selects/second, by secondary index, approx 10 rows each
needs at least one secondary index
needs some mechanism to expire keys if they are not updated for 75 secs (can be done via programmatic garbage collector but will require additional 'last_update' index and will add some load)
consistency is not required
durability is not required
db should be stored in memory

For now we use Redis, but it does not have secondary index and it's keys index:foo:* is too slow. Membase also does not have secondary index (as far as I know). MongoDB and MySQL memory engine have table-level locks. What engine will fit our use case?

You don't need to write anything to disk or synchronize state anywhere, which are the two biggest complexities in database design, and your data set is all of 7 MB. Have you considered storing this yourself in the appropriate data structures? It shouldn't be hard to get that level of performance in whatever language you're using. — willglynn, Oct 19 '12 at 14:37
What do you mean by "secondary index"? Many NoSQL techs such as MongoDB use only one query unless it's an `$or` so secndary indexes for querying doesn't really work. Also MongoDB is not stored in memory so it is out of the question here. — Sammaye, Oct 19 '12 at 15:14
@Sammaye What's typically meant by "secondary index" is any index beyond the primary index. So for MongoDB, this is any index beyond the one on `_id`. — JohnnyHK, Oct 19 '12 at 16:10
@JohnnyHK Aha thanks, so many terminologies used for the same thing sometimes its hard to get them all :) — Sammaye, Oct 19 '12 at 16:48

Didier Spezia · Answer 1 · 2013-12-24T10:40:51.073

2

I think Redis does indeed support secondary indexes, but they are not implicit: you have to maintain them explicitly. A sorted set can be used for a range scan, but it is less convenient than what you could have in MongoDB or a relational database.

If you are looking for a Redis-like server with better support for range scans, I would suggest to have a look at the discontinued AlchemyDB project, or the Aerospike 3 (commercial) product (which includes part of the features of AlchemyDB).

edited Dec 24 '13 at 10:40

answered Oct 19 '12 at 16:03

Didier Spezia

70,911
12
189
154

Aerospike is not open source. The whole point of supporting secondary indexes is that they are maintained automatically. – Kostja Apr 30 '13 at 09:14

score 2 · Answer 2 · answered Dec 14 '12 at 09:43

2

Try to using http://tarantool.org/ It has secondary indexes and it completely in-memory. Also it uses fast asynchronous IProto protocol.

It proven stable and incredibly fast.

answered Dec 14 '12 at 09:43

deepwalker

21
2

score 0 · Answer 3 · answered Oct 19 '12 at 21:28

0

I was thinking he meant 7mb / record. One other option is PlayOrm on top of Cassandra using it's Scalable-SQL...more machines can result in better performance as the disks would be working in parallel on the range scans and such.

answered Oct 19 '12 at 21:28

Dean Hiller

19,235
25
129
212

score 0 · Answer 4 · answered Oct 20 '12 at 05:30

I presume that if you can achieve the performance requirements you have specified, the fact that a DB is or is not in memory would be irrelevant.

Your performance goals are well within the capabilities of a single, non-replicated and non-sharded MongoDB instance. Mongo uses memory mapped files so all the data will be in memory but the DB will flush to disk continuously. By default Mongo uses "unsafe" mode, which eliminates much of the burden of disk I/O. It's worth considering for your use case as opposed to trying to do in application code what a DB should do for you.

Adding replicas (Mongo's term for clustering) and/or sharding will give you a chance to easily increase performance should you need to. Multiple indexes (including composite indexes), flexible querying, bulk inserts and atomic updates are nice features that help with performance and can take the burden off your application code.

score 0 · Answer 5 · answered Oct 11 '22 at 15:47

There are four types of NoSQL databases. You cloud check which one meets your requirement.

1. Document storage database. The atomic unit of the document storage database is a document. Each document is a JSON. Different documents can have different schemas and contain different fields. The document storage database allows certain fields in the document to be indexed, enabling faster queries based on these fields.

2. Column Storage Database. The atomic unit of the column storage database is a column in the table, which means that the data is stored by column. Its column storage feature makes column-based queries very efficient, and because the data on each column has almost the same structure, it can better compress the data.

3. key-value storage database. It's fast. First, it's because unique keys are used, and secondly, because most key-value storage databases store data in memory (RAM), which can be accessed quickly.

4. Figure storage database. The graph storage database contains nodes that represent entities and edges that represent relationships between entities.

How to choose NoSQL database engine?

5 Answers5