I have a multinode cluster in Hadoop consisting of two machines. first machine (configured master and slave) has name node and data node running and the second machine (configured slave) has data node running.
I want to upload and distribute the data between them almost equally?
I have two scenarios:
First: suppose I have a file file1 of 500MB in size and I uploaded to first machine using:
hadoop fs -put file1 hdfspath
Will it be divided into both of the data nodes or only stored in first machine?
When the distribution will happen: is it after after exceeding the block size in first machine then it will distribute or there is another criteria.
Will it be equally divided 250mb for each datanode?
Second: suppose I have 250 files each one 2mb in size and I uploaded the folder containing them dir1 to first machine using:
hadoop fs -put dir1 hdfspath
same question: will the data be distributed in both machines or only in first machine. Also when and how the distribution will occur?
Thank you.