8

I am building a website where users will be allowed to upload images . There is also restriction on maximum amount of space each user can use .

I have two ideas in mind .

  1. To store image in a NoSQL db like mongoDB using GridFS .
  2. To store the image in File system and have a path stored in DB .

Which among the above is better? and Why?

Ahmed Shabib
  • 687
  • 1
  • 8
  • 16
  • 2
    It's very subjective, but personally I would always put images in a filesystem where they can be individually managed, accessed, modified, distributed and backed up rather than in some enormous, amorphous glob of a multi-terabyte database that takes hours to back up and where you need a heap of SQL to even do the simplest of things. YMMV. – Mark Setchell Apr 24 '14 at 08:22
  • This question will probably be better suited for programmers.stackexchange.com if you change it to be less subjective. Both approaches can be good, depending on multiple factors (e.g. size of the data, usage patterns, how many servers you have etc.). Try to edit this question to make it less broad and answerable. – Christian P Apr 24 '14 at 09:04
  • You should consider the costs of hosting and managing a large MongoDB solution vs a file system or a solution like AWS S3/Azure Blob/Etc. – WiredPrairie Apr 24 '14 at 10:43

1 Answers1

14

sigh why does everybody jump to GridFS?

Depending on the size of the images and the exact use case, I'd recommend to store the images directly in the DB (not via GridFS). Here's why:

File System

  • Storing the images in the file system is proven to work well, but it's not trivial
  • You will need a different backup system, failover, replication, etc. This can be tricky DevOps-wise
  • You will need to create a smart directory structure which is a leaky abstraction, because different file systems have very different characteristics. Some have no problem storing 16k files in one folder, others start to choke at a mere 1k files. A common approach is to use a convention like af/2c/af2c2ab3852df91.jpg, where the folders af and 2c are inferred from the file name (which itself might be a hash of the content for deduplication purposes).

GridFS

GridFS is made for storing large files, and for storing files in a very similar way to a file system. That comes with some disadvantages:

  • For every file, you will need one fs.file and one fs.chunk document. Chunking is totally required for large files, but if your files are below 256k on average, there's no real chunking going on (default chunk size is 256k). So when storing small files in GridFS, you get the overhead without the advantage. Bad deal. It also requires two queries instead of one.
  • It imposes a certain structure on your collection, for instance to have a 'file name'. It depends on the use case, but I often choose to use a hash as the id and store the hash in the user, for example. That deduplicates, is easy to implement, aligns beautifully with caching and doesn't require coming up with any convention. It's also very efficient because the index is a byte array.

Things might look different if you're operating a site for photographers where they can upload their RAW files or large JPEGs at 10MB. In that case, GridFS is probably a good choice. For storing user images, thumbnails, etc., I'd simply throw the image in its own document flat.

mnemosyn
  • 45,391
  • 6
  • 76
  • 82