Is that the only way to start a hadoop job from command line?

Question

I work on hadoop now, in Pseudo-distributed mode. I try some mapreduce,package it as jar,and copy the file to hadoop.then use

./bin/hadoop jar *

to start it.

My question is:Is there any other way to do it?If we have thousands of jobs to run.We can't just type in command.What we do in the PRODUCTION ENVIRONMENT?

thanks.

normally,we write some jobs in shell script.just like: jobs.sh :bin/hadoop jar a.jar \r\n bin/hadoop jar b.jar,and they are running one by one in FIFO order.that's a batch job. if u want Running jobs parallely in hadoop, try Fair Scheduler or Capacity Scheduler — Matuobasyouca, Jul 10 '12 at 08:34

score 1 · Accepted Answer · answered Jul 10 '12 at 01:35

1

If you have 1000s of jobs, write a shell script and submit them if there are no dependencies between the jobs. If there are dependencies then use try using Apache Oozie as Chris mentioned.

answered Jul 10 '12 at 01:35

Praveen Sripati

32,799
16
80
117

write a shell script,is that you mean like this: ./bin/hadoop jar a.jar com.A ./bin/hadoop jar b.jar com.B ,i did not try it. but i guess it will block. – Matuobasyouca Jul 10 '12 at 05:03
can u give some example that what the shell script like in a PRODUCTION ENVIRONMENT? – Matuobasyouca Jul 10 '12 at 05:21
What will block? You can schedule any number of jobs theoretically and the scheduler in Hadoop will schedule them appropriately. – Praveen Sripati Jul 10 '12 at 06:03

score 0 · Answer 2 · edited May 23 '17 at 12:20

0

It is possible launch MapReduce jobs in an automated way. For example, a java program, you can launch a job. the trick is to ensure that you export your Job into a jar file, and call that exported jar file from your java code (which is separate). I had a similar question recently and posted it and perhaps it relates to you as well.

Launch a mapreduce job from eclipse

edited May 23 '17 at 12:20

Community

1
1

answered Jul 10 '12 at 00:01

Tucker

7,017
9
37
55

i read that,i guess that u want do a remote debug?if u mean that,try Java Platform Debugger Architecture(JPDA) – Matuobasyouca Jul 10 '12 at 05:10

score 0 · Answer 3 · answered Jul 10 '12 at 01:25

0

If you need to schedule jobs to run, or want to design a workflow of jobs with inter-dependencies then look into Apache OOZIE.

answered Jul 10 '12 at 01:25

Chris White

29,949
4
71
93

Is that the only way to start a hadoop job from command line?

3 Answers3