3

I'm familiar with the infrastructure or architecture of Cloudera:

Master Nodes include NameNode, SecondaryNameNode, JobTracker, and HMaster. Slave Nodes include DataNode, TaskTracker, and HRegionServer.

Master nodes should all be on their own nodes (unless its a small cluster, than SecondaryNameNode, JobTracker, and HMaster may be combined, and even the NameNode if its a really small cluster).

Slave Nodes should always be colocated on the same node. The more slave nodes, the merrier.

SecondaryNameNode is a misnomer, unless you enable it for High Availability.

Does MapR maintain this setup? How is it similar and how is it different?

Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231

4 Answers4

4

Good information by @JamCon in his reply, but there are some things worth clarifying:

The comment regarding patches is not accurate. MapR packages a broad range of Hadoop projects in its distribution so you don't have to separately compile anything. And MapR has the same APIs as any other distro, meaning their packages are not about compatibility but are simply bug fixes / enhancements from the community. There's typically no extra work required to get Hadoop ecosystem projects to run on MapR. And they release ecosystem updates at least once a month, as far as I can tell, to keep current with new enhancements.

Regarding the inclusion of YARN, we've been running MapR on YARN across large clusters since July '14! I believe MapR has their own ecosystem project vetting process, and they graduate MapR packaged versions to GA once they determine a project is ready for enterprise support.

Community
  • 1
  • 1
Byron Dover
  • 161
  • 1
  • 7
2

MapR deviates from the vanilla Hadoop & CDH distributions a bit. It keeps most of the services and structure (Job Tracker, Data Nodes, HBase Master & Region, MR, etc), but there are some significant differences.

One of the defining items about MapR's distribution is that it doesn't use HDFS. It has its own custom FS, which features HA and operates without Name Nodes (via distributed metadata). It also allowed them to enable NFS access years ahead of the rest of the Hadoop distros, as well as snap shotting.

The custom FS does complicate their distribution a bit, though ... for example, when you want to run products or services, you often need to install the MapR specific patches. When you want to run mahout, you need to compile it with the MapR patches from https://github.com/mapr/mahout. But it also gives them an opportunity to incorporate better security at the FS level, as seen by the implementation of "Access Control Expressions" and Cluster/Job/Volume ACLs.

Overall, it's a well structured product. My biggest concern is they've deviated so far from the norm that when new innovations are adopted, they're slow to adapt, because it has to be incorporated into their highly modified environment. YARN is a perfect example ... they haven't released it yet, even though their competitors have.

JamCon
  • 2,313
  • 2
  • 25
  • 34
  • Thanks. As an update, it looks like [MapR incorporates YARN](http://www.mapr.com/blog/take-charge-hadoop-2x-and-yarn#.UzIqd1dUN1E) as of 2/11/2014 – Matthew Moisen Mar 26 '14 at 01:17
  • Ah, good point ... I hadn't checked up on them since January. I'm actually supposed to meet up with a couple MapR engineers next week to discuss the recent updates! – JamCon Mar 26 '14 at 01:24
0

From an architecture stand point with MapR there are no master nodes. The functions that the master nodes provide in a typical Hadoop architecture are instead distributed and performed within the "data nodes" of MapR.

https://www.mapr.com/why-hadoop/why-mapr/architecture-matters

Larry Advey
  • 180
  • 1
  • 5
0

MapR doesn't have master node, inbuilt mechansim but in Cloudera have master node, secondary name node and resource manager http://commandstech.com/mapr-vs-cloudera-vs-hortonworks/