-3

I start making research about Data science and machine learning development using mahout, and i found hadoop, Both made me confused :

  1. what is the relationship between hadoop and mahout?
  2. For Data Science and machine learning stuff, what is the best to start ?
Abu taha
  • 61
  • 1
  • 12
  • 2
    Mahout is a machine learning library, Hadoop is a distributed computing framework and for Data Science [here](http://en.wikipedia.org/wiki/Data_science) you go! Make some research! Google and wiki may be a good start for you! – eliasah Dec 30 '14 at 14:37

1 Answers1

3

Hadoop is a framework based on distributed storage and distributed processing concepts for processing large data. It is having a distributed storage layer called hadoop distributed file system (HDFS) and a distributed processing layer called mapreduce. Hadoop is designed in such a way that it can run on commodity hardware. Hadoop is written in Java.

Mahout is a member in hadoop ecosystem which contains the implementation of various machine learning algorithms. Mahout utilizes hadoop's parallel processing capability to do the processing so that the end user can use this with the large data sets without much complexity. User can either reuse these algorithms directly or use with some customizations, but no need to worry much about the complexities of the mapreduce implementation of the algorithm.

For Data Science and machine learning stuffs, you should learn about the usage and details of the algorithms. Then you can concentrate on mahout. Since mahout jobs in distributed mode are mapreduce jobs, you should learn hadoop fundamentals and mapreduce programming.

Amal G Jose
  • 2,486
  • 1
  • 20
  • 35