I am new to hadoop and confused about how HDFS works with ZFS or BTRFS.
Can physical drives be mounted using ZFS and then have HDFS installed on top of ZFS?
Or can HDFS be installed directly?
Yes; my cluster uses btrfs on the partitions I've configured for HDFS. The one thing I would caution you about is the use of brtfs' transparent compression feature, which is enabled via mount option. The Hadoop HDFS daemons are aware of the size of the volumes and the freeused for HDFS on a node and the free space thereon, and if you enable compression those size values become "unreal," i.e., an assertion of "50% full" of "500MiB" when the volume is mounted compressed neither means that it can hold only 500MiB nor that it's half full (with compression enabled on a btrfs volume, you can write a file of all zeros that is insanely larger than the volume's actual size). Because of that unreliability and because of the extra CPU overhead that would go into compression and decompression, I would avoid the temptation even though for a lot of data typically going into HDFS the effective compression ratios might be quite favorable.
Another reason to avoid compressing HDFS volumes is that as the daemons rearrange blocks what with your replication setting and all, the machines will be uncompressing blocks on read on one node only to write them back out compressed on another node.
Having said this, one potential feature the Hadoop team may want to consider implementing would be to handle compression at the HDFS level; under that regime blocks would only be compressed or uncompressed when written or read by code, to include the hdfs utility. I'm just not sure it would be worth the CPU overhead, though.
Yes. HDFS can be slapped on just about any Linux file system that supports the '/' directory naming and organization convention with at least 2 directory level deep.
(Source: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html )