1

I'm using mongodb gridfs for a kind of file cache where I store binary data including some metadata. Cause my disk space is limited I cannot store all my files in the mongodb, so I'd like to apply a FIFO algorithm.

I've found the capped collections which seem to fit perfectly for this case but mongo stores gridfs data in two different collection (fs.chunks und fs.files) which are closely related. It is possible to define one (or both?) of these collections as capped or will I get zombie data in the other collection?

  • You can also insert binary data and raw data into a field in a row, as such you don't need to a use gridfs if your files are under 16meg big. However if they are bigger then you might find TTL indexes more to your liking – Sammaye Jul 04 '14 at 08:38
  • I don't think that a backend service is the right place to cache files. Depending on the use case, you can either use [varnish](https://www.varnish-cache.org) or [memcached](http://memcached.org). While memcached has to be used by the application and does not use disk space (except for paging, iirc), varnish is a reverse proxy which can use disk space and ram, depending on your needs. So I'd use memcached for authorized access scenarios and varnish for non authorized access scenarios. In case you use java, you might want to use hazelcast. – Markus W Mahlberg Jul 04 '14 at 12:41
  • @MarkusWMahlberg wouldn't memcached by "backend" as well? It is effectively an in-memory key value store for use with server-side languages – Sammaye Jul 04 '14 at 13:05
  • Yes, it is. Kind of. A very specialized backend. Developed for *caching*. Who would have guessed that? ;) – Markus W Mahlberg Jul 04 '14 at 16:02
  • @MarkusWMahlberg it is designed for backend (database) caching, it is not like proxying with nginx – Sammaye Jul 04 '14 at 16:50
  • @MarkusWMahlberg in fact it is completely different kind of cache than varnish too – Sammaye Jul 04 '14 at 16:53
  • Which is why I suggested both and the OP should choose depending on his or her use case. And as far as I understood we are not talking of database catching, but the caching of files. For this application, there are quite large installations for both of the suggestions. – Markus W Mahlberg Jul 04 '14 at 23:45
  • @Sammaye An example for varnish would be vimeo, which stream videos directly from varnish. An example for an according memcached installation are Wikipedia's caching centers, solely consisting of `memcached` instances which serve prerendered pages. – Markus W Mahlberg Jul 05 '14 at 00:00
  • @MarkusWMahlberg ok fair enough there is a way to serve from memcached without touching the backend – Sammaye Jul 05 '14 at 14:35
  • Capped collections + GridFS definitely aren't the right fit for your use case. Per the other comments here, it sounds like you're looking for a solution which is more of a caching data store (storing key/value blobs) or a file cache. An obvious choice would be something like Amazon S3. – Stennie Jul 05 '14 at 22:20
  • Thanks a lot for your information. I guess memcached won't help, cause the amount of data won't fit into RAM, I didn't know varnish before but I'll have a look on it. At first glance it looks more than an http server cache than a file system access caching solution. – terribleherbst Jul 07 '14 at 06:34

0 Answers0