Possibly a bit of confusion here on what GridFS actually is, as it is not actually something that MongoDB "does" but it is really just a driver specification for how to store data beyond the 16MB BSON limit in standard collections.
To do this, there are two collections used by GridFS implementations. One is generally named "files" and the other is "chunks". These have different purposes and are not a "choice" for where to store as you ask.
The "files" collection is for "metadata", which is just some information about the "file", and basically is whatever you want it to be. This "describes" the file and most importantly acts as a "reference" to the _id
used to identify the file in the "chunks" collection. As a sample:
db.fs.files.findOne()
{
"_id" : ObjectId("533b67d8afc27c15fc82caf4"),
"filename" : "twig.pl",
"chunkSize" : 262144,
"uploadDate" : ISODate("2014-04-02T01:28:56.915Z"),
"md5" : "9b10c69537126652aebc2742ca3ad69a",
"length" : 267
}
So there is an _id
and some other data about the file. It's just a standard collection and you can query it as such.
Of course the "chunks" actually refers to the "parts" of the actual "file", and in a brief form will look something like this:
{
"_id" : ObjectId("533b67d8c6ed8872a7fa9ff0"),
"files_id" : ObjectId("533b67d8afc27c15fc82caf4"),
"n" : 0,
"data" : BinData(0,"IyEvdXNyL2Jpbi9lbnYg....")
}
And there will be as many of those as is required to actually store the content.
As for the "size" of the chunks, this is generally up to the driver implementation, but there would usually be a way to specify what to use, but from the specification:
"By default GridFS limits chunk size to 255k..."
But of course you should try and keep this consistent in your implementation. As you can see from the above "meta" document, the specification is to "store" that information with the metadata so that this can be determined when reading back and "constructing" a handle of sorts.
So the "driver implementaion" will actually handle how the "read/write" operations on chunks occurs, and usually do something to present the results as a "file" or "stream" of sorts. But these are just "ordinary collections" and nothing special in themselves. So all normal query and CRUD operations work on these collections just like any other.