11

I am newer to Hadoop, and want to know what is the differences between Hadoop-common, Hadoop-core and Hadoop-client?

By the way,for a given class, how do I know which artifact contains it in Maven ? For example, which one contains the org.apache.hadoop.io.Text?

chenzhongpu
  • 6,193
  • 8
  • 41
  • 79

3 Answers3

11

To help provide some additional details regarding the differences between Hadoop-common, Hadoop-core and Hadoop-client, from a high-level perspective:

  • Hadoop-common refers to the commonly used utilities and libraries that support the Hadoop modules.
  • Hadoop-core is the same as Hadoop-common; It was renamed to Hadoop-common in July 2009, per https://hadoop.apache.org/.
  • Hadoop-client refers to the client libraries used to communicate with Hadoop's common components (HDFS, MapReduce, YARN) including but not limited to logging and codecs for example.

Generally speaking, for developers who build apps that submit to YARN, run a MR job, or access files from HDFS use Hadoop-client libraries.

Anthony R.
  • 341
  • 3
  • 5
2

In order to build a Hadoop map-reduce application you need only hadoop client dependency. (Use new API). Dependencies like hadoop-hdfs,hadoop-common,hadoop-clientapp,hadoop-yarn-api are resolved from this.

Sachin
  • 1,675
  • 2
  • 19
  • 42
  • In `hadopp-client`'s `pom` file, I only found it has dependency of `org.apache.hadoop hadoop-core`. And for `hadoop-core`, it seems that it has no any dependencies of `hadoop-*` stuff. Please explain in detail. – chenzhongpu Mar 06 '15 at 04:11
  • Please see the dependency tree tab of your pom file. So that you can see which all dependencies are resolved from it. – Sachin Mar 06 '15 at 04:15
2

From techopedia

Hadoop Common refers to the collection of common utilities and libraries that support other Hadoop modules. It is an essential part or module of the Apache Hadoop Framework, along with the Hadoop Distributed File System (HDFS), Hadoop YARN and Hadoop MapReduce.

Like all other modules, Hadoop Common assumes that hardware failures are common and that these should be automatically handled in software by the Hadoop Framework.

Hadoop Common is also known as Hadoop Core.

Hadoop Client libraries helps to load data into the cluster, submit Map Reduce jobs describing how that data should be processed, and then retrieve or view the results of the job when its finished. Have a look at this article

This Apache link provides the list of dependencies of Hadoop Client library.

Ravindra babu
  • 37,698
  • 11
  • 250
  • 211