0

I have documents that exceed 16MB. These documents are comprised of many key/value pairs and their containing subdocuments (dicts) and arrays (lists), which may be nested several levels deep.

If I try to insert one of these super-16MB files, I get an error regarding the size of the doc being larger than 16MB. So, I started looking into GridFS. GridFS seems great for chunking up files such as binary data. However, I am not clear on how I would "chunk up" highly nested K/V docs like I described above. I am thinking that I may just need to break these huge docs down into smaller docs and bite the bullet and implement transactions due to no atomicity of insertion on multiple docs.

Is my understanding of GridFS way off? Is breaking up the doc into smaller documents with transaction support the best way forward, or is there a way to use GridFS here?

Kind thanks for your attention.

SYNAX
  • 75
  • 4

2 Answers2

0

Just curious why are you storing key/value pairs in a document instead of a collection?

If you need that many of them you can just store them in a collection (assuming they're all unique and not in any sort of nested structure).

Or you could migrate that data to redis which would be more performant at looking up key/values anyways and have no reasonable limit. It's okay to mix multiple storage engines.

Edit in response to comment 1:

If you're using 16 megs of key value pairs in a document I would actually question how you are modelling your data now. Just because the database is schemaless doesn't mean the correct way to store key values in mongo is in one large document.

Are you able to provide more information on what you are trying to do so that we can better help understand your needs and provide back better answers? I'm sure we can help you more than this.

JasonG
  • 5,794
  • 4
  • 39
  • 67
  • Well, cause then i have to worry about transactions right? are you suggesting smaller docs with references to a master entity? I like the idea of Redis but that means i need to do some data modeling. the nice thing about mongo was just being able to through an arbitrary doc at it...not necessarily a bad thing, may take a closer look at it...thanks for your response. – SYNAX Feb 20 '13 at 22:14
  • added a query in response to your comment above - can you provide the requested details? – JasonG Feb 21 '13 at 01:55
0

GridFS treats the files as opaque binary blobs. It doesn't make a distinction between a "key/value document" and, say, an image file.

If you want do queries, etc. on the values contained in your documents, you'll need to split them manually yourself into smaller documents. On the other hand, if your documents are really just opaque blobs of data that happen to have internal structure (which you only care about within your program, not in the DB), then GridFS is a good choice.

Another consideration is performance: Do you really need to read and write giant documents of 16MB+? Or are you generally dealing only with a subset of each document? If the former, use GridFS; if the latter, split your documents up across different collections with references between them.

Cameron
  • 96,106
  • 25
  • 196
  • 225
  • thanks Cameron - just wanted to be sure I wasn't misunderstanding GridFS. Think I will need to break the docs down and wrap some transactions around it. THANKS AGAIN! – SYNAX Feb 20 '13 at 22:12