You must know the difference between virtual concept and the actual storage.
HDFS (Hadoop Distributed File System) just specifies how data will be stored in datanodes. When you say store a file in HDFS it means that it will be virtually considered as an HDFS file but actually stored in the disk of a datanode.
Let's see in details how does it work:
HDFS as a block-structured file system: it will break individual files into blocks of a fixed size(by default 64 Mbytes). These blocks are stored across a cluster of machines composed of one namenode and several datanodes.
The nameNode handles the metadata structures (e.g., the names of files and directories) and regulates access to files
it also executes operations like open/close/rename. To open a file, a client contacts the NameNode and retrieves a list of locations for the blocks that comprise the file. These locations identify the DataNodes which hold each block. Clients then read file data directly from the DataNode servers, possibly in parallel. The NameNode is not directly involved in this bulk data transfer, keeping its overhead to a minimum.
- DataNodes will bee responsible for serving read/write requests and block creation/deletion/replication. So every block in the HDFS system is actually stored in dataNode.