I have been confused for quite a long time on this question. There are quite a few explanations on why HDFS using such a large block size comparing to the block size of os file system or the disk block. I understand the explanation saying that this will lower the traffic volume and storage requirements of NameNode for meta-data management. What I am always confused is the explanation on "minimizing seekTime /transferTime ratio". Just as the following post states. Why Is a Block in HDFS So Large?
I know my question might seem dumb cuz I am not from a CS degree background so I lack quite a few knowledge on topics like composition of modern computer etc. Please excuse me for that.
My confusion is mainly caused due to following considerations:
A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks
- Is it the case that a whole HDFS Block will be stored on the disk in a continuously manner?
- When does the seek happens? It doesn't happen once per block on the disk?
- If it is store continuously then we can read the whole block with only one-time seek?
I think my doubt has its ground since the conclusion of large HDFS block can reduce seek/transfer time ration can only be valid if all three conditions are met. Otherwise, the conclusion doesn't make sense to me. Hope anyone can tell me where I can find solid truth to prove my guess.Thanks in advance.