0

I am new to mongoDB and want to use it for storing files i.e. images and videos (size can be more than 40-50MB). For that we can use mongoDB gridFS. But in gridFS their are two collection i.e chunks and files. Now I want to know that which collection i.e. chunks or files will be best for me. How or on which basis we decide it.

REQUIREMENTS:

  • store images and videos
  • videos can be more than 40-50MB
  • frequent access to this media
Community
  • 1
  • 1
murtaza.webdev
  • 3,523
  • 4
  • 22
  • 32
  • If you want to use GridFS, you should use their higher-level API. That will manage chunks and files collections for you. If you access those directly (and do it wrong), you could break things. – Thilo Sep 16 '14 at 07:31
  • for storing files (images / videos) we have either option of files or chunks right?. from that which one to choose. – murtaza.webdev Sep 16 '14 at 07:35
  • 4
    It's not a choice. "files" contains the "metadata", file name, size etc, basically whatever information you want. "chunks" is the actually content, broken into "chunks" to stay under the BSON limit. It's all handled in the driver spec to "read/write" the chunks transparently. So it's not a choice, just how it is done. – Neil Lunn Sep 16 '14 at 07:44
  • Beat me to it. I'd make that the answer @NeilLunn – tom Sep 16 '14 at 08:03

1 Answers1

1

Possibly a bit of confusion here on what GridFS actually is, as it is not actually something that MongoDB "does" but it is really just a driver specification for how to store data beyond the 16MB BSON limit in standard collections.

To do this, there are two collections used by GridFS implementations. One is generally named "files" and the other is "chunks". These have different purposes and are not a "choice" for where to store as you ask.

The "files" collection is for "metadata", which is just some information about the "file", and basically is whatever you want it to be. This "describes" the file and most importantly acts as a "reference" to the _id used to identify the file in the "chunks" collection. As a sample:

db.fs.files.findOne()
{
    "_id" : ObjectId("533b67d8afc27c15fc82caf4"),
    "filename" : "twig.pl",
    "chunkSize" : 262144,
    "uploadDate" : ISODate("2014-04-02T01:28:56.915Z"),
    "md5" : "9b10c69537126652aebc2742ca3ad69a",
    "length" : 267
}

So there is an _id and some other data about the file. It's just a standard collection and you can query it as such.

Of course the "chunks" actually refers to the "parts" of the actual "file", and in a brief form will look something like this:

{
    "_id" : ObjectId("533b67d8c6ed8872a7fa9ff0"),
    "files_id" : ObjectId("533b67d8afc27c15fc82caf4"),
    "n" : 0,
    "data" : BinData(0,"IyEvdXNyL2Jpbi9lbnYg....")
}

And there will be as many of those as is required to actually store the content.

As for the "size" of the chunks, this is generally up to the driver implementation, but there would usually be a way to specify what to use, but from the specification:

"By default GridFS limits chunk size to 255k..."

But of course you should try and keep this consistent in your implementation. As you can see from the above "meta" document, the specification is to "store" that information with the metadata so that this can be determined when reading back and "constructing" a handle of sorts.

So the "driver implementaion" will actually handle how the "read/write" operations on chunks occurs, and usually do something to present the results as a "file" or "stream" of sorts. But these are just "ordinary collections" and nothing special in themselves. So all normal query and CRUD operations work on these collections just like any other.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • +1. And if you can use an official driver with a GridFS implementation, just use that one. Don't mess with the internals. – Thilo Sep 17 '14 at 00:56