0

I have a use case where storing an _id that is a composite of the following would be immensely useful:

  1. 96 bit int
  2. 96 bit int
  3. 96 bit int ObjectID

This works out as a 36 byte ASCII string. It will be byte reversed to become big endian.

I would like to query "Give me all documents where _id starts with {1} concatenate {2}." This would return documents where _id matches components 1 & 2 above, with any 3.

Yes, this is a Cassandra approach. The rest of the query I have can only be done realistically in MongoDB.

How can I do this?

For reference, I am using C#.

IamIC
  • 17,747
  • 20
  • 91
  • 154

1 Answers1

1

With your IDs saved as a 36-byte ascii strings, you can use a regular expression match to find prefix substrings in the index.

The caveats for efficiently using an index with regular expressions in MongoDB are that:

  • the regex match should be left-rooted (which will be the case for your "_id starts with" search)
  • indexes are case-sensitive

The query would be similar to:

/* Assumption: comp1 and comp2 are ASCII string representations
   that can be combined for a key prefix */

var spec = new Document("_id", new MongoRegex(string.Format("^{0}{1}", comp1, comp2)));
collection.Find(spec)
Stennie
  • 63,885
  • 14
  • 149
  • 175
  • One question: ASCII of course is binary, so spans the whole 0 - 255 range. Will this effect the regex? – IamIC Sep 11 '12 at 10:51
  • @IanC: MongoDB regexes support UTF-8; ASCII should not be an issue. The [string format in BSON](http://bsonspec.org/#/specification) is also UTF-8. – Stennie Sep 11 '12 at 13:45
  • 1
    As noted in the related discussion on the [mongodb-user group](https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/rVQqR2QByNo), you also have the option of creating a [composite _id](http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-The%5CidField) which could be {Int, Int, ObjectID} rather than creating a single string _id. – Stennie Sep 11 '12 at 20:12
  • thanks, I noted that. Although I would need 5 components to handle 96 & 96 bits & ObjectID. If that has no adverse effect on query speed, then it suite me fine. – IamIC Sep 11 '12 at 23:02