we have set up the hadoop cluster with 2 machines, we are trying to implement cluster in our real time projects , we need information in a multiple node cluster about uploading the data , suppose if i have 9 data nodes , which slave node we need to upload the data.can i can give choice to upload the data into 2 slave nodes , if i am uploading the data into hdfs is it replicated into another slave nodes?. As we observed curretnly hdfs using /tmp location incase if the /tmp is full which location HDFS will use.
Asked
Active
Viewed 92 times
1 Answers
0
Purpose of adding the more number of cluster is to enlarge the data storage.. Are you looking for secure the cluster, grant the privileges to some of the user shold upload the data in to the HDFS ?? right If means you can implement the KERBEROS principle or authorize the user to upload the data!
Data replication: Yes once the data will be uploaded to the HDFS it will replicate the data in to the nodes, Once the decommission of data node occurs it ill take care the data itll moved form the decommissioned node into the other node.

karthik
- 551
- 7
- 21
-
hi karthik, thanks for your reply, need small clarification to the below points . if you uploading 2 different clients from same group(company) trying to upload data into different data nodes is it will automatically replicated in all the data nodes ?.But as per theoretical knowledge client should approach to name node ,it will suggest to go and upload into specific client how can i achieve, as of now i am uploading manually in one machine – srikanth Jul 03 '15 at 13:41