0

I am working on pdf document clustering over hadoop so I am learning mapreduce by reading some examples on internet.In wordcount examples have lines

job.get("map.input.file")
job.getboolean()

What is function of these functions?what is exactly map.input.file where is it to set? or is it just a name given to input folder? Please post answer if anyone know.

For code see the following link wordcount 2.0 example=http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html

Amar
  • 11,930
  • 5
  • 50
  • 73
user2200278
  • 95
  • 1
  • 4
  • 10

1 Answers1

1

These are job configurations. i.e. set of configurations which are passed on to each mapper and reducer. Now, these configurations consist of well defined mapreduce/hadoop related configurations as well as user-defined configurations.

In your case, map.input.file is a pre-defined configuration and yes it is set to a comma separated list of all the paths you have set as input path.

While wordcount.skip.patterns is a custom configuration which is set as per user's input, and you may see this configuration to be set in run() as follows:

conf.setBoolean("wordcount.skip.patterns", true);

As for when to use get and when to use getBoolean, it should be self-explanatory, as whenever you want to set a value of type boolean you will use getBoolean and setBoolean to get and set the specific config value respectively. Similarly you have specific methods for other data types as well. If it is string then you may use get().

Amar
  • 11,930
  • 5
  • 50
  • 73