4

I would like to store a large number of JSON documents using a documented-oriented database, all with very similar schema (though not identical).

One example document:

{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
}

Do any of the systems (CouchDB etc.) use compression (of any sort) to avoid storing the key strings (e.g. "firstName") over and over again?

My motivation is to minimise the size of the database on disk when there are millions of documents, especially when some of the recurring keys are much longer than e.g. "firstName".

Thanks for your thoughts!

W


Edit: Having thought about this more, what I think I am asking about is a specific case of a more general compression system in which a compression dictionary is (partly?) shared across multiple compressed documents in a document store (and probably built up over time). This would then handle compression of more than just JSON keys.

Would be interesting to do!

wodow
  • 3,871
  • 5
  • 33
  • 45
  • I'm not aware of any document stores that support compression at this time (doesn't mean that there aren't any). There is a JIRA open on Mongo to support this: http://jira.mongodb.org/browse/SERVER-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel – Spike Gronim Feb 16 '11 at 00:00
  • The Mongo Jira is talking about gzip in general - not quite the same, though possibly it would be if the compression dictionary would be built up and used across multiple documents with a store. – wodow Feb 16 '11 at 11:14

1 Answers1

2

I would just add a 'key mapping' document where you store the keys and their shortcuts ... doing the mapping in your backend should not be all that much trouble ...

{
   FirstName: 'a',
   Town: 'b'
}

{ 
  a: 'Peter',
  b: 'Zurich'
}
Tobi Oetiker
  • 5,167
  • 2
  • 17
  • 23
  • Thanks, Tobi, that is what I am thinking. However, I would argue that it is arguably a feature that is desirable for so many uses of a document store that it makes better sense to implement at the document store server itself or in a wrapping layer. – wodow Feb 16 '11 at 11:12
  • I agree, but since there are other considerations for nosql choice I would not make something trivial like this your primary selection criterion. – Tobi Oetiker Feb 17 '11 at 05:36