0

I need a NoSQL database to store our files with following specification:

  • easy Clustring
  • Distributed
  • Automatic failover
  • easily scalable
  • Fast

I have searched and find some solution like "MongoDB GridFS" or "Riak" but i think i have to research more in both requirement (download server requirements) and solutions.

Has anybody any experience in this fields?

Community
  • 1
  • 1
Saeed Zarinfam
  • 9,818
  • 7
  • 59
  • 72
  • 1
    Is there a reason you want to store these files in a database rather than as files in a distributed file system or a file cluster? In my experience, storing actual files (rather than paths) in a database has always caused a lot of headaches. – Ruslan Feb 15 '14 at 05:11
  • For centralize management and Scalability, do you have better solution? – Saeed Zarinfam Feb 15 '14 at 05:15
  • 1
    There are several solutions that come to mind. The reason I wouldn't recommend using a database is because a database will come with plenty of complexities of its own in terms of management and administration, while it will not give you any real advantages in regards to managing files - since a database's primary strength is manipulating and organizing data. Are you on a Linux or Windows platform? – Ruslan Feb 15 '14 at 05:19
  • I am on Linux, There are a lot of metadata around these files. – Saeed Zarinfam Feb 15 '14 at 05:21
  • 1
    Storing the metadata, as well as the paths, to the files in the database shouldn't be a problem at all. It's just the files themselves. I've only dealt with this problem on Windows in my experience but I have definitely come across some very cool solutions for this problem on Linux in the past; let me see if I can dig that up. – Ruslan Feb 15 '14 at 05:24
  • Here's one open source distributed file system: http://en.wikipedia.org/wiki/Global_File_System – Ruslan Feb 15 '14 at 05:27
  • The way we solved it on Windows (it's actually a bit funny due to how archaic it is) was simply referencing the files via UNC... The UNC file share was a file-cluster with complete redundancy and regular backups, but the operating system wasn't aware of that. It just pulled the files via UNC as it would do with any other file. It is definitely not state-of-the-art, but was built that way about a decade ago and continues to work fine up to this day (dealing with tens of terabytes of user uploads, with traffic in the hundreds of GB/day range). Maybe something similar can be done via NFS? – Ruslan Feb 15 '14 at 05:30
  • Thank you for your solution but if we want to store our file in a NoSQL database, what is your suggestion? – Saeed Zarinfam Feb 15 '14 at 05:33
  • The folder structure in that system was split by date, where each date fragment would be a separate folder. For example, files uploaded on July 12, 2010, would be stored in /Uploads/2010/07/12, and so on. This way, when space needed to be cleared up, we would just copy the older years onto another file cluster and mount that new destination as the folder for that year). – Ruslan Feb 15 '14 at 05:33
  • I don't have enough knowledge about NoSQL to make a good recommendation. Hopefully someone else can chime in. – Ruslan Feb 15 '14 at 05:36

1 Answers1

1

From my (experienced) point of view there are no obvious reasons against MongoDB GridFS. If your files are below 16 MB of size each, I would even store them in a MongoDB collection directly.

heinob
  • 19,127
  • 5
  • 41
  • 61