12

I am a Web developer. I have experience in Web technologies like JavaScript , Jquery , Php , HTML . I know basic concepts of C. Recently I had taken interest in learning more about mapreduce and hadoop. So I enrolled my self in parallel data processing in mapreduce course in my university. Since I dont have any prior programing knowledge in any object oriented languages like Java or C++ , how should I go about learning map reduce and hadoop. I have started to read Yahoo hadoop tutorials and also OReilly's Hadoop The Definitive Guide 2nd.Edition.

I would like you guys to suggest me ways I could go about learning mapreduce and hadoop.

yesh
  • 2,052
  • 4
  • 28
  • 51

8 Answers8

5

You can access Hadoop from many different languages and a number of resources set up Hadoop for you. You could try Amazon's Elastic MapReduce (EMR), for instance, without having to go through the hassle of configuring the servers, workers, etc. This is a good way to get your head around MapReduce processing while delaying a bit the issues of learning how to use HDFS well, how to manage your scheduler, etc.

It's not hard to search for your favorite language & find Hadoop APIs for it or at least some tutorials on linking it with Hadoop. For instance, here's a walkthrough on a PHP app run on Hadoop: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html

Iterator
  • 20,250
  • 12
  • 75
  • 111
4

Answer 1 :

  • It is very desirable to know Java. Hadoop is written in Java. Its popular Sequence File format is dependent on Java.
  • Even if you use Hive or Pig, you'll probably need to write your own UDF someday. Some people still try to write them in other languages, but I guess that Java has more robust and primary support for them.
  • Most Hadoop tools are not mature enough (like Sqoop, HCatalog and so on), so you'll see many Java error stack traces and probably you'll want to hack the source code someday

Answer 2

  • It is not required for you to know Java.
  • As the others said, it would be very helpful depending on how complex your processing may be. However, there is an incredible amount you can do with just Pig and say Hive.
  • I would agree that it is fairly likely you will eventually need to write a user defined function (UDF), however, I've written those in Python, and it is very easy to write UDFs in Python.
  • Granted, if you have very stringent performance requirements, then a Java based MapReduce program would be the way to go. However, great advancements in performance are being made all of the time in both Pig and Hive.
  • So, the short answer to your question is, "No", it is not required for you to know Java in order to perform Hadoop development.

Source : http://www.linkedin.com/groups/Is-it-must-Hadoop-Developer-988957.S.141072851

Abhishek Goel
  • 18,785
  • 11
  • 87
  • 65
3

1) Learn Java. No way around that, sorry.

2) Profit! It'll be very easy after that -- Hadoop is pretty darn simple.

Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
  • Since java is a huge programing lang with many libraries. Is there anything specific I can read about , or should I have to know core java to implement hadoop. – yesh Sep 06 '11 at 00:25
  • 1
    You wouldn't need to know anything except the language and the core APIs, primarily the `java.lang` and `java.util` packages. So no Servlets or EJBs or Spring or any other such frameworks. – Ernest Friedman-Hill Sep 06 '11 at 00:49
  • Sorry #1 is wrong, though learning some Java (e.g. knowing what a classpath is) is useful & may be necessary. Many languages work with Hadoop - that's the beauty of Hadoop Streaming and lots of APIs. – Iterator Sep 06 '11 at 01:17
  • There is no need to learn Java to use Hadoop. Hadoop Steaming (1) can be used to write Hadoop Jobs in a variety of languages. (1) http://hadoop.apache.org/common/docs/r0.15.2/streaming.html – Praveen Sripati Sep 06 '11 at 01:24
2

Go through the Yahoo Hadoop tutorial before going through Hadoop the definitive guide. The Yahoo tutorial gives you a very clean and easy understanding of the architecture. I think the concepts are not arranged properly in the Book. That makes it a little difficult to study it. So do not study it together. Go through the web tutorial first.

Nilsaw
  • 103
  • 7
2

It sounds like you are on the right track. I recommend setting up some Virtual Machines on your home computer to start taking what you see in the books and implementing them in your VMs. As with many things the only way to become better at something is to practice it. Once you get into I am sure you will have enough knowledge to start a small project to implement Hadoop with. Here are some examples of things people have built with Hadoop: Powered by Hadoop

ITOps
  • 21
  • 4
1

I just put together a paper on this topic. Great resources above, but I think you'll find some additional pointers here: http://images.globalknowledge.com/wwwimages/whitepaperpdf/WP_CL_Learning_Hadoop.pdf

rICh
  • 1,709
  • 2
  • 15
  • 25
1

Feel free to join my blog about Big Data - https://oyermolenko.blog. I’ve been working with Hadoop for a couple of years and in this blog want to share my experience from the early start. I came from .NET environment and faced a couple of challenges related to switching from one language into another. My blog is oriented on people who didn’t work with Hadoop but have some primary technical background like you. Step by step I want to cover the whole family of Big Data services, describe the concepts and common problems I met working with them. Hope you will enjoy it

Alex
  • 8,827
  • 3
  • 42
  • 58