1

I work on hadoop now, in Pseudo-distributed mode. I try some mapreduce,package it as jar,and copy the file to hadoop.then use

./bin/hadoop jar *

to start it.

My question is:Is there any other way to do it?If we have thousands of jobs to run.We can't just type in command.What we do in the PRODUCTION ENVIRONMENT?

thanks.

cfeduke
  • 23,100
  • 10
  • 61
  • 65
Matuobasyouca
  • 98
  • 1
  • 6
  • normally,we write some jobs in shell script.just like: jobs.sh :bin/hadoop jar a.jar \r\n bin/hadoop jar b.jar,and they are running one by one in FIFO order.that's a batch job. if u want Running jobs parallely in hadoop, try Fair Scheduler or Capacity Scheduler – Matuobasyouca Jul 10 '12 at 08:34

3 Answers3

1

If you have 1000s of jobs, write a shell script and submit them if there are no dependencies between the jobs. If there are dependencies then use try using Apache Oozie as Chris mentioned.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
0

It is possible launch MapReduce jobs in an automated way. For example, a java program, you can launch a job. the trick is to ensure that you export your Job into a jar file, and call that exported jar file from your java code (which is separate). I had a similar question recently and posted it and perhaps it relates to you as well.

Launch a mapreduce job from eclipse

Community
  • 1
  • 1
Tucker
  • 7,017
  • 9
  • 37
  • 55
0

If you need to schedule jobs to run, or want to design a workflow of jobs with inter-dependencies then look into Apache OOZIE.

Chris White
  • 29,949
  • 4
  • 71
  • 93