2

I have set up a new cluster using Cloudera Manager 5.5.1 , these 2 properties named mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap seem ambiguous with the other two properties mapreduce.map.java.opts and mapreduce.reduce.java.opts.

Should I use the former or the latter set of properties ?

Roney Michael
  • 3,964
  • 5
  • 30
  • 45
user1965449
  • 2,849
  • 6
  • 34
  • 51

2 Answers2

3

Both mean the same. The way you specify these values differs.

I guess, mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap are specific to Cloudera distribution. Refer: Tuning YARN.

mapreduce.map.java.opts and mapreduce.reduce.java.opts are part of standard Hadoop configuration. Check the Hadoop trunk code here: MrJobConfig.java

Also, if you refer to the ticket here: https://issues.cloudera.org/browse/DISTRO-752, it discusses about setting these values.

For e.g. mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap are specified as (983 MB):

<property>
    <name>mapreduce.map.java.opts.max.heap</name>
    <value>983</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts.max.heap</name>
    <value>983</value>
</property>

mapreduce.map.java.opts and mapreduce.reduce.java.opts are specified as: (983 MB) (I use these settings)

<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx983m</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx983m</value>
</property>

Observe the difference in the value. One is set as "983" and the other is set as "-Xmx983m"

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
  • Great answer. If they conflict (have different values e.g. one is global another is set through hive's set parameter), I guess mapreduce.map/reduce.java.opts would win over mapreduce.map/reduce.java.opts.max.heap as they are more generic? – Tagar Jan 28 '16 at 05:49
  • Yes. mapreduce.map.java.opts and mapreduce.reduce.java.opts are part of standard Hadoop configuration. These "mapreduce.map.java.opts.max.heap and mapreduce.reduce.java.opts.max.heap" seem more specific to Cloudera distribution. I have not seen these parameters in the Hadoop code. – Manjunath Ballur Jan 28 '16 at 05:53
  • Also, according to bug you referenced https://issues.cloudera.org/browse/DISTRO-752 "properties mapreduce.map.java.opts.max.heap, mapreduce.reduce.java.opts.max.heap do not seem to have any effects". So user should go with mapreduce.map/reduce.java.opts=-Xmx until that bug is fixed. I have a case with Cloudera to make sure that's how it works. – Tagar Jan 28 '16 at 05:55
  • Yes. I use standard Hadoop settings. – Manjunath Ballur Jan 28 '16 at 05:58
0

Should I use the former or the latter set of properties ?

Answer depends if you use them in CM or not.

If in CM, then mapreduce.map/reduce.java.opts.max.heap are preferable as it's parameter to tune exactly heap of reducers/mappers. mapreduce.map/reducers .java.opts is more generic and if you want to just set heap memory, convoluted as you'll need to add -Xmx too.

If you plan to use anywhere else, then answer is don't use it. As it doesn't exist anywhere except Cloudera Manager. Read comments in https://issues.cloudera.org/browse/DISTRO-752 - Cloudera will most likely remove that parameter name and dix documentation to avoid confusion.

Tagar
  • 13,911
  • 6
  • 95
  • 110