5

I'm thinking about using Couchbase as a cache layer. I'm aware of the many advantages provided by Couchbase, like the easy scalability. But what interests me more is the rich document model of couchbase, compared to the simple key-value one of memcached.

My RDBMS is SQL Server, and we use NHibernate. The queries and the database are already quite optimized and I think that caching is the best option for further scaling.

My project is to implement a simple relationnel model between entities (much simpler than the one in the RDBMS), to handle invalidation. When an entity is invalidated (removed from cache) by the application, all dependent entities could also be removed. The logic of defining the dependencies between entities would be handled at the application level by a dedicated component. There would be 10 or 12 different entities (I don't want to cache all my application domain).

My document model in Couchbase would look like this:

  • Key (the one generated by the application), keys' format depends on entity type
  • Hashed key (to have a uniform unique key accross all entities)
  • Entity
  • Dependencies - list of hashed keys of the entities that must be removed when main entity is removed

So my questions are:

  • On invalidation, we would need to resolve a graph of dependencies (asynchronously). Is it fast to look for specific keys with around 500k entities?
  • Any feedback on the general idea?

Maintaining the dependencies between entities can be quite simplified, and might not be such a big issue.

Pierre

Pierre Murasso
  • 591
  • 7
  • 14
  • How many "invalidations" would you need to perform? Are you talking 500K entities overall or 500K invalidated each time? – theMayer Jan 23 '14 at 02:39

2 Answers2

5

I use Couchbase 2.2 in production as a persistent cache layer and really happy with it (running about 2M documents). My app getting really fast gets (1 millisecond). Your idea is valid and I don't see anything wrong with using Couchbase as a entity storage for invalidation. Its a mature and very stable product.

You are correct in your entity design. You can have a main json doc that has list of references to other child documents. So that before deleting main document you will delete all children first.

Also, not sure if its applicable in your case, you can take advantage of Couchbase ability to expire documents. When you insert key/value(json doc) you can specify TTL(time to live) if you know it upfront. This way you don't need to explicitly delete entities from Couchbase.

Delete operation itself is fast (you can run it as asynchronous operation) and having 500K documents in the Couchbase cluster it really small size. You should see under 1 millisecond get operations.

But consider having minimum 3 Couchbase nodes in one cluster, so that you can take one node down at any given point of time without compromising data stored in the cluster. See Sizing a Couchbase Server 2.0 cluster

Some additional resources:

user1697575
  • 2,830
  • 1
  • 24
  • 37
1

Here are my thoughts:

On invalidation, we would need to resolve a graph of dependencies (asynchronously). Is it fast to look for specific keys with around 500k entities?

Are you looking for keys in your RDBMS or in CB? If in CB, you will need to use a view/index; now, views are disk-based, but stored in sorted order so they are no slower than SQL indices. Accessing them in parallel will be faster than in series. It will be the slow point in your operation though if you use CB.

Continuing along with this thought, I have used CB successfully to store and navigate a hierarchical data structure with 500k+ nodes in it. CB performs well, but does take a few seconds to spit out the whole index if I need it (which I do if I need to do a mass-update operation).

Any feedback on the general idea?

The idea is sound. In fact, I'm seeing 10x the performance of SQL with hierarchical queries when I run them on my Couchbase cluster. I also found that a single couchbase instance outperforms multiple instances when doing an index lookup - I do not know why that is (the 2-instance cb index is 5x faster than my SQL setup). To speed things up further, you can parellelize the queries to the cb index.

theMayer
  • 15,456
  • 7
  • 58
  • 90
  • Yes I'll be looking for keys in CB, either when I retrieve a single entity from cache, or when I look for the dependencies. In the first, In the second case, it's a list of entities. – Pierre Murasso Jan 23 '14 at 10:04
  • What I would do then is create an index that emits the parentId as key; then you can search for all children of a given parent in one lookup. – theMayer Jan 23 '14 at 11:27