In HDFS, a file is partitioned into blocks which are distributed across several nodes.
I am wondering if that is also true in the following distributed file systems: NFS, Andrew File systems (and Bayou, Coda, if you happen to know)?
Thanks.
In HDFS, a file is partitioned into blocks which are distributed across several nodes.
I am wondering if that is also true in the following distributed file systems: NFS, Andrew File systems (and Bayou, Coda, if you happen to know)?
Thanks.
There are multiple version on NFS protocol - v2, v3, v4.0, v4.1 and v4.2. In version 4.1 NFS protocol defines parallel nfs, known as pNFS, which defines how distributed data can be access via NFS protocol. A pNFS capable version is available in Linux kernel 3.9 and newer.
pNFS describes various ways how client can access distributed data:
Though pNFS allows (and expects) a file to be stripe over multiple data servers (a-la raid-0), all of existing file layout
server implementations distribute file sets and keeping whole blocks of a single file on the same server. This of course can change over time.
I would expect that block-layout based linux server supports file striping, but I am not an expert on it. You better check Redhat Admin Guide
NOTE The HDFS NFS gateway uses NFSv3 and proxies all (distributed) data through a single node.
Of those, only HDFS was designed from the beginning to split data blocks across many nodes, and the on disk format for doing so. Dell's HPC group calls these parallel file systems. Or I suppose you could say scale out.
The rest are network protocols that access storage arrays remotely. This was the distributed challenge in the 1980s, how to connect workstations to the department's central storage? While AFS, Coda, and Bayou have various replication schemes, they are at the server or volume level, not the block or extent level.
pNFS is optional add on to NFS that allows block access via arbitrary data protocols. This abstraction allows some clever engineering for where data blocks are stored. But quite different from HDFS's design.
Notice that Ceph, Lustre, or HDFS have means to provide file access via NFS. Parallel file systems that achieved object storage spread across many nodes sometimes provide the least common denominator to clients who want a file share. Even more layers of abstraction...