0

I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:

User{
  UUID id
  String email
  ...
}
Message{
  UUID id,
  UUID userId
  String text
  ....
}
MessageFeatures{
  UUID id
  UUID messageId
  UUID userId
  PrimitiveObject feature1
  ....
  PrimitiveObject featureN
}

The most important queries are:

  • Get all messages for user
  • Get all message features for a user
  • Get message by uuid
  • Get/Update message feature by uuid
  • Get message feature by message uuid

Less important(can be slow) queries are like :

  • Get message features where user_id = someuuid and featureX = value
  • Get all/count user uuids for which featureX = value
  • update message features set featureX = newValue where featureX = oldValue

While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.

If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.

I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather...

Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.

gladiator
  • 722
  • 1
  • 9
  • 16

1 Answers1

1

You should translate the relational model above directly to Couchbase. You should create GSI indexes for all the relationships (id fields). Use EXPLAIN to make sure every query uses an index. For direct lookup by id, use USE KEYS.

Scatter/gather in Couchbase means something different than what you describe. It is when a single index scan has to visit several nodes and then merge the scan results (distributed index). Instead, each GSI index lives on a single node, so GSI indexes avoid scatter/gather.

Finally, note that Couchbase is fast at key-value fetches even across nodes, so you do not need to worry about locality of data.

geraldss
  • 2,415
  • 1
  • 11
  • 12
  • need some more clarification... according to http://www.couchbase.com/nosql-databases/downloads multi dimensional and independent scaling is not available in community edition... does this mean i will have to run all 3 services (query,index,data) on each node because if this is the case GSI wont be of much use? Am i correct or there is an intelligent maneuver around this... For ex. consider a cluster of 6 nodes and user messages are scattered on all nodes... how can a query to get all messages for a user id be efficient even with GSI considering network latency... am i missing something? – gladiator Nov 12 '15 at 18:51
  • Even though you will be running all three services on each node, each individual GSI index will only live on a single node, so you will not have scatter-gather. – geraldss Nov 12 '15 at 23:26
  • Could you point me to some resources on this so I could be sure – gladiator Nov 13 '15 at 04:35
  • Sure. Look for the Couchbase 4.0 documentation on GSI indexes. Also look for the indexing presentations and videos from Couchbase Connect 2015 (youtube, google, etc.). There are some in-depth architecture presentations. – geraldss Nov 13 '15 at 15:57