1

I have been confused for quite a long time on this question. There are quite a few explanations on why HDFS using such a large block size comparing to the block size of os file system or the disk block. I understand the explanation saying that this will lower the traffic volume and storage requirements of NameNode for meta-data management. What I am always confused is the explanation on "minimizing seekTime /transferTime ratio". Just as the following post states. Why Is a Block in HDFS So Large?

I know my question might seem dumb cuz I am not from a CS degree background so I lack quite a few knowledge on topics like composition of modern computer etc. Please excuse me for that.

My confusion is mainly caused due to following considerations:

A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks

  1. Is it the case that a whole HDFS Block will be stored on the disk in a continuously manner?
  2. When does the seek happens? It doesn't happen once per block on the disk?
  3. If it is store continuously then we can read the whole block with only one-time seek?

I think my doubt has its ground since the conclusion of large HDFS block can reduce seek/transfer time ration can only be valid if all three conditions are met. Otherwise, the conclusion doesn't make sense to me. Hope anyone can tell me where I can find solid truth to prove my guess.Thanks in advance.

Boyu Zhang
  • 219
  • 2
  • 12

0 Answers0