File Storage in Hadoop

Question

In Hadoop, suppose I have a file A.txt and in that I have some sample data say:

Hello how are you? I am studying hadoop partitioning. Hadoop is interesting to learn and has good opportunities etc...

How does this data gets stored in blocks? As per my understanding say Hello how will be stored in one block and are you? gets stored in other block. If this is the case, then at retrieving, how does this work.????

So basically I want to understand how the data in file gets stored in HDFS blocks. Will it break the content or will they split the content based on the some unicodes or content sizes etc ...

score 0 · Answer 1 · answered Oct 09 '17 at 11:35

According to official Hadoop site

HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different DataNode.

HDFS only wants to make sure that files are split into evenly sized blocks that match the predefined block size for the Hadoop instance (unless a custom value was entered for the file being stored).

You can read more about this under section Data Organization at the official hadoop site. Also you can refer Data Blocks in the Hadoop Distributed File System (HDFS).

File Storage in Hadoop

1 Answers1