HDFS vs GridFS: When to use which?

Question

HDFS and GridFS are two great technologies for distributed file saving but what are their differences? What type of problems fit better to each?

score 3 · Answer 1 · answered May 22 '12 at 13:33

HDFS intended for batch processing (you're know, when you running a query that will read many of your files one-by-one), but really suck when you doing random access operations and it is pain in the neck to maintain it or even deploy (you're know, all of these Zookepers, Namenodes and so on). On the other hand GridFS is slower at batches, but not in the case when you do a lot of random accesses, but have a bigger storage overhead compared to HDFS.

I would say that you should use HDFS for analitycs and GridFS for backing web-site.

score 2 · Answer 2 · answered Jan 31 '12 at 11:15

2

Use HDFS if you are using Hadoop and use GridFS if you are using MongoDB. Neither are that great for just storing random files. They are built to work with the analytic platform.

answered Jan 31 '12 at 11:15

Donald Miner

38,889
8
95
118

score 1 · Answer 3 · answered Jan 31 '12 at 11:21

1

I would recommend to use GridFS, if you are going only store your files without any analytic and map-reduce jobs. It's easier to customize and maintain. I used for file hosting application. HFDS in this case is overkill.

answered Jan 31 '12 at 11:21

Anton

5,831
3
35
45

score 0 · Accepted Answer · answered Jan 31 '12 at 11:21

GridFs is little slow vs other fs ... at first think about other FS like ceph ...

Distributed file system - Wikipedia, the free encyclopedia -> http://en.wikipedia.org/wiki/Distributed_file_system

i think HDFS is realy fs but GridfS is only database grid

at last use benchmark but hardly suggest something Distributed file system

nginx-gridfs Benchmarking Raw Results | ypass.net -> http://www.ypass.net/solaris/nginx-gridfs-benchmarks/rawresults.php

HDFS vs GridFS: When to use which?

4 Answers4