4

We have a database with following parameters:

  • 30k records, 7mb in size
  • 20 inserts/second
  • 1000 updates/second
  • 1000 range selects/second, by secondary index, approx 10 rows each
  • needs at least one secondary index
  • needs some mechanism to expire keys if they are not updated for 75 secs (can be done via programmatic garbage collector but will require additional 'last_update' index and will add some load)
  • consistency is not required
  • durability is not required
  • db should be stored in memory

For now we use Redis, but it does not have secondary index and it's keys index:foo:* is too slow. Membase also does not have secondary index (as far as I know). MongoDB and MySQL memory engine have table-level locks. What engine will fit our use case?

Community
  • 1
  • 1
Poma
  • 8,174
  • 18
  • 82
  • 144
  • 3
    You don't need to write anything to disk or synchronize state anywhere, which are the two biggest complexities in database design, and your data set is all of 7 MB. Have you considered storing this yourself in the appropriate data structures? It shouldn't be hard to get that level of performance in whatever language you're using. – willglynn Oct 19 '12 at 14:37
  • +1 with @willglynn. Otherwise did you check Riak feature ? – Aurélien B Oct 19 '12 at 14:59
  • What do you mean by "secondary index"? Many NoSQL techs such as MongoDB use only one query unless it's an `$or` so secndary indexes for querying doesn't really work. Also MongoDB is not stored in memory so it is out of the question here. – Sammaye Oct 19 '12 at 15:14
  • @Sammaye What's typically meant by "secondary index" is any index beyond the primary index. So for MongoDB, this is any index beyond the one on `_id`. – JohnnyHK Oct 19 '12 at 16:10
  • 1
    @JohnnyHK Aha thanks, so many terminologies used for the same thing sometimes its hard to get them all :) – Sammaye Oct 19 '12 at 16:48

5 Answers5

2

I think Redis does indeed support secondary indexes, but they are not implicit: you have to maintain them explicitly. A sorted set can be used for a range scan, but it is less convenient than what you could have in MongoDB or a relational database.

If you are looking for a Redis-like server with better support for range scans, I would suggest to have a look at the discontinued AlchemyDB project, or the Aerospike 3 (commercial) product (which includes part of the features of AlchemyDB).

Didier Spezia
  • 70,911
  • 12
  • 189
  • 154
  • Aerospike is not open source. The whole point of supporting secondary indexes is that they are maintained automatically. – Kostja Apr 30 '13 at 09:14
2

Try to using http://tarantool.org/ It has secondary indexes and it completely in-memory. Also it uses fast asynchronous IProto protocol.

It proven stable and incredibly fast.

deepwalker
  • 21
  • 2
0

I was thinking he meant 7mb / record. One other option is PlayOrm on top of Cassandra using it's Scalable-SQL...more machines can result in better performance as the disks would be working in parallel on the range scans and such.

Dean Hiller
  • 19,235
  • 25
  • 129
  • 212
0

I presume that if you can achieve the performance requirements you have specified, the fact that a DB is or is not in memory would be irrelevant.

Your performance goals are well within the capabilities of a single, non-replicated and non-sharded MongoDB instance. Mongo uses memory mapped files so all the data will be in memory but the DB will flush to disk continuously. By default Mongo uses "unsafe" mode, which eliminates much of the burden of disk I/O. It's worth considering for your use case as opposed to trying to do in application code what a DB should do for you.

Adding replicas (Mongo's term for clustering) and/or sharding will give you a chance to easily increase performance should you need to. Multiple indexes (including composite indexes), flexible querying, bulk inserts and atomic updates are nice features that help with performance and can take the burden off your application code.

Sim
  • 13,147
  • 9
  • 66
  • 95
0

There are four types of NoSQL databases. You cloud check which one meets your requirement.

1. Document storage database. The atomic unit of the document storage database is a document. Each document is a JSON. Different documents can have different schemas and contain different fields. The document storage database allows certain fields in the document to be indexed, enabling faster queries based on these fields.

2. Column Storage Database. The atomic unit of the column storage database is a column in the table, which means that the data is stored by column. Its column storage feature makes column-based queries very efficient, and because the data on each column has almost the same structure, it can better compress the data.

3. key-value storage database. It's fast. First, it's because unique keys are used, and secondly, because most key-value storage databases store data in memory (RAM), which can be accessed quickly.

4. Figure storage database. The graph storage database contains nodes that represent entities and edges that represent relationships between entities.

cwtuan
  • 1,718
  • 1
  • 18
  • 20