synchronization issues about hadoop federation

Question

I have some questions about hadoop federation. As far as I know, it has multiple masters(namenode) running at same time.

So my question is that if a client has a request, how to determine which master to serve the request from client.

Another question is that whether the metadata stored in every master is concurrent with each other or not.

If the data in masters is concurrent, while two clients have requests at same time at two different master, how to deal with the synchronization issues.

Hope I make my question clear. I only read web on apache hadoop. Any material and tutorial are very grateful. And comment and correction are very appreciated.

score 0 · Accepted Answer · answered Feb 27 '15 at 08:22

0

Using client side mount tables we can map file paths to namenodes (core-site.xml configuration below)

  <property>
        <name>fs.viewfs.mounttable.default.link./namenode1</name>
        <value>hdfs://namenode1:9001/home</value>
    </property>
    <property>
        <name>fs.viewfs.mounttable.default.link./namenode2</name>
        <value>hdfs://namenode2:9001/home</value>
    </property>}

example during put operation we can specify path and request will go to namenode1

bin/hadoop fs -put file.txt /namenode1/input

In HDFS Federation each namenode manages its own metadata .

answered Feb 27 '15 at 08:22

Prahalad

106
8

If namenodes only manage their own metadata, how does every namenode keep consistent with each other? – John Feb 27 '15 at 15:03
In HDFS HA namenodes keep consistent with each other so that when one namenode goes down standby node takes over . HDFS Federation is different on large clusters load on the namenodes is distributed through HDFS federation which maintain their own namespace - HDFS HA can be implemented on each of the federated namenodes. – Prahalad Feb 27 '15 at 15:53
For the first question, I think your solution is a good one, Thanks Prahalad. But for the synchronization issue, i am still confused. First, is it necessary to keep all metadata of namenodes same? Then, if no, why. If yes, how. If you have any materials, that would be very appreciated. – John Feb 27 '15 at 16:30
In HDFS Federation metadata of namenodes are different - on large clusters memory becomes a limiting factor if one namenode is used as all metadata is stored in memory to serve client requests if more than one namenode is used and load distributed among them with their own namespace cluster can scale to any number of nodes . You can read HDFS Federation in Hadoop definitive guide. – Prahalad Feb 28 '15 at 06:02

synchronization issues about hadoop federation

1 Answers1