0

From what I've read it is recommended to store anything under 16 MB in a document and anything above in GridFS. Okay fine. My question is what is the best way to do that?

My application can process hundreds of files concurrently and the files can be of all sizes, for example 1K to 100GB (or more). The majority will be under 16MB, but I don't want to read the entire input stream into a big byte array and then set an attribute on a document with the byte array and store the document. That can create a ton of overhead for us.

I want to be able to store the document, but stream the input stream into the document and in reverse stream the document over to a input stream.

Anyone have suggestions?

  • If these files are al 'equal' meaning that the reader of those files doesn't know how big it is, you should all store them in GridFS so you don't need to go looking for it! – Nils Ziehn Oct 05 '15 at 17:19
  • You might also consider a third option: Store the files on a storage area network and only store the metadata in MongoDB. – Philipp Oct 05 '15 at 17:39
  • We want to get away from NFS because it can't horizontally scale for us. Also we need to scale widely, that is across data centers and continents. – Bob Krier Oct 05 '15 at 17:53
  • Also, I was considering storing anything above 16MB in GridFS and everything below in a document as I've seen suggested else where. I don't have to concern with the files ever changing. What I can expect to know is a I have an input stream of arbitrary bytes and the length will be known. Sometimes the files won't come from disk, but an HTTP post or email POP3, JMS queue just to name a few. – Bob Krier Oct 05 '15 at 17:55
  • Can you put the data into a cloud blob service like AWS S3, Azure Blob Storage or Google Storage? – Nils Ziehn Oct 05 '15 at 20:14

0 Answers0