I'm looking for a way to hook into Hadoop a new file system to benchmark the performance of this new file system against HDFS. I'm new to Hadoop so please feel free to correct me if I've asked the wrong question. If it helps, I'll be using Amazon's EMR.
4 Answers
Yes you can run hadoop on top of other filesystems, they just have to implement the hdfs interface. Here's an example of running it on a new filesytem called tachyon. On amazon the obvious choice would be to run on the S3 filesystem.
Not an expert on this part but it seems like it's relatively trivial to make you're filesystem transparently support hadoop map reduce, here is how tachyon did it TachyonFileSystem , basically it's just extending the hadoop FileSystem class.

- 18,343
- 7
- 63
- 78
You will need to create a Hadoop file system driver for your new file system. This would be a class that extends org.apache.hadoop.fs.FileSystem
. Examples of such 'drivers' are the well known DistributedFileSystem
aka. HDFS, the LocalFilesystem
or S3FileSystem
etc. You then have to register your new file system with a scheme in core-site.xml
, lets say you register 'gaurav':
<property>
<name>fs.gaurav.impl</name>
<value>com.package.GauravFileSystem</value>
</property>
You can now reference files in your own filesystem with the registered scheme: gaurav://somepath/somename
. Optionally you can make your new filesystem as the default filesystem by changing fs.default.name
. Your cluster should now run on top of your own filesystem (if everything is correct and works, of course).
For example see HADOOP-9629 for an example of a complete Hadoop file system.

- 288,378
- 40
- 442
- 569
No, Hadoop is only usable with HDFS ... MapR is using another version of HDFS.
But you can develop your own MapReduce on your DFS and compare it to Hadoop.

- 1,130
- 7
- 13
Another way is to make use of ServiceLoader, by placing a configuration file with path META-INF/services/org.apache.hadoop.fs.FileSystem
, and the qualified implementation class name as value, we could get this filesystem like below
FileSystem.get(new URI("{SCHEME}://" + "{VALUE}" + "/"), conf)

- 1,856
- 1
- 15
- 25