Hadoop streaming is advantageous for those cases when the developer do not have the much knowhow of Java and can write Mapper/Reducer in any scripting language faster.
When compared to custom jar jobs, a streaming Job would also have the additional overhead of starting a scripting(Python/Ruby/Perl) VM. This leads to a lot of inter-process communication, resulting in reduced efficiency of the jobs in most of the cases.
Using Hadoop streaming brings with it restrictions on the input/output formats. There are times when you would like to create custom input/output formats, using custom jars would be the natural choice. Also using Java one can over-ride/extend many of hadoop's functionalities to one's need/choice.
Quoting from an answer here:
Hadoop do has capability to work with MR jobs created in other
languages - it is called streaming. This model only allow us to define
mapper and reducer with some restrictions not present in java. In the
same time - input/output formats and other plugins do have to be
written as java classes So I would define decision making as
following:
- Use Java, unless you have serious codebase you need to resue in Your MR job.
- Consider to use python when you need to create some simple ad hoc jobs.
As for streaming only available for mapred
API, it doesn't make sense. While using streaming mappers/reducers are written in another languages, so no point worrying about which API hadoop internally will use in order to execute them.