Why Is a Block in HDFS So Large?

Question

Can somebody explain this calculation and give a lucid explanation?

A quick calculation shows that if the seek time is around 10 ms and the transfer rate is 100 MB/s, to make the seek time 1% of the transfer time, we need to make the block size around 100 MB. The default is actually 64 MB, although many HDFS installations use 128 MB blocks. This figure will continue to be revised upward as transfer speeds grow with new generations of disk drives.

Svend · Accepted Answer · 2014-03-14T18:40:44.900

36

A block will be stored as a contiguous piece of information on the disk, which means that the total time to read it completely is the time to locate it (seek time) + the time to read its content without doing any more seeks, i.e. sizeOfTheBlock / transferRate = transferTime.

If we keep the ratio seekTime / transferTime small (close to .01 in the text), it means we are reading data from the disk almost as fast as the physical limit imposed by the disk, with minimal time spent looking for information.

This is important since in map reduce jobs we are typically traversing (reading) the whole data set (represented by an HDFS file or folder or set of folders) and doing logic on it, so since we have to spend the full transferTime anyway to get all the data out of the disk, let's try to minimise the time spent doing seeks and read by big chunks, hence the large size of the data blocks.

In more traditional disk access software, we typically do not read the whole data set every time, so we'd rather spend more time doing plenty of seeks on smaller blocks rather than losing time transferring too much data that we won't need.

edited Mar 14 '14 at 18:40

answered Mar 12 '14 at 13:49

Svend

6,352
1
25
38

It was a great reply.Thanks for the quick answer. But one small query : As the seek time is 10ms , it means it takes 10ms to read a 100MB block of data. In 1sec , then it will be able to seek 100 block of data(100*100MB) . It has seeked 100000MB of data but during that period only 100MB of data has been transferred and yet the remaining 9900MB has to be transferred. So my point is , evenif we will seek as fast as possible , still then also we have to sit and wait . Could you please clarify? – Kumar Mar 12 '14 at 18:35
13

The seek time is the time we need to spend before reading any data, grossly speaking this is the time necessary to move the reading head to where the data sits physically on the disk (+ other similar kinds of overhead): see here: http://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#Seek_time In order to read 100Mb stored contiguously, we spend `10ms+100Mb/(100Mb/s)=1.01s`. So a big proportion of that time is spent actually reading the data and only a small one is spent seeking. If those same 100M were stored as 10 blocs, that would give `10*10ms+100Mb/(100Mb/s)=2s`. – Svend Mar 12 '14 at 19:10
Got it Svend.Thanks a lot for the quick turnaround. – Kumar Mar 13 '14 at 06:51
This is a good explanation , compared to other links i've seen – redeemed Nov 11 '15 at 07:22
I just got here for curiosity and I gt one of the clearest explanations of this issue ever read. Congrats – aran Jan 07 '21 at 21:45

score 2 · Answer 2 · answered Dec 15 '15 at 23:30

2

Since 100mb is divided into 10 blocks you gotta do 10 seeks and transfer rate is (10/100)mb/s for each file. (10ms*10) + (10/100mb/s)*10 = 1.1 sec. which is greater than 1.01 anyway.

answered Dec 15 '15 at 23:30

Shanker Lolakapuri

21
2

score -2 · Answer 3 · edited Nov 25 '15 at 08:17

-2

Since 100mb is divided among 10 blocks, each block has 10mb only as it is HDFS. Then it should be 10*10ms + 10mb/(100Mb/s) = 0.1s+ 0.1s = 0.2s and even lesser time.

edited Nov 25 '15 at 08:17

Bartłomiej Semańczyk

59,234
49
233
358

answered Nov 25 '15 at 06:09

DUNGA SUJIT 11110102

1

Why Is a Block in HDFS So Large?

3 Answers3

Linked