0

I am following the Apache Map Reduce tutorial and I am at the point of assigning input and output directories. I created both directories here:

~/projects/hadoop/WordCount/input/
~/projects/hadoop/WordCount/output/

but when I run fs, the file and directory are not found. I am running as ubuntu user and it owns the directories and the input file.

Based on a proposed solution below, I then tried:

Found my hdfs directory hdfs dfs -ls / which is /tmp I created input/ and output/ inside /tmp with mkdir

Tried to copy local .jar to.hdfs:

hadoop fs -copyFromLocal ~projects/hadoop/WordCount/wc.jar /tmp

Received:

copyFromLocal: `~projects/hadoop/WordCount/wc.jar': No such file or directory

enter image description here

Any troubleshooting ideas? Thanks

Community
  • 1
  • 1
Slinky
  • 5,662
  • 14
  • 76
  • 130
  • Create the input dir with `hadoop fs -mkdir /input` and then run the wordcount jar as `hadoop jar wc.jar WordCount /input /output`. Let me know if this solves – franklinsijo Feb 04 '17 at 14:33
  • Thanks, I think that will work! I was able to create input/ in hdfs. One question: How do I get my input data file into hdfs /input and do I need to create /output in the same way, or is that local? I presume your post will explain and thanks – Slinky Feb 04 '17 at 14:42
  • I have explained it as an answer – franklinsijo Feb 04 '17 at 14:53

2 Answers2

1

As the hadoop Invalid Input Exception suggests it can not find location "/home/ubuntu/projects/hadoop/WordCount/input".

Is it local or HDFS path? I think it is local that's why the input Exception happening.

To execute a jar file you have to put jar in the HDFS directory. And the input and output directories also have to be in HDFS.

Use copyFromLocal command to copy the jar from local to hadoop directory as:

hadoop fs -copyFromLocal <localsrc>/wc.jar hadoop-dir
ravi
  • 1,078
  • 2
  • 17
  • 31
1

MapReduce expects the Input and Output paths to be the directories in HDFS and not local unless the Cluster is configured in Local mode. Also the Input directory must exist and the Output should not.

For example:

If Input is /mapreduce/wordcount/input/, this directory must be created with all the input files in it. Use HDFS commands to create them.

hdfs dfs -mkdir -p /mapreduce/wordcount/input/
hdfs dfs -copyFromLocal file1 file2 file3 /mapreduce/wordcount/input/

file1 file2 file3 are locally available input files

And if the Output is /examples/wordcount/output/. The parent directories must exist but not the output/ directory. Hadoop creates it on the job execution.

hdfs dfs -mkdir -p /examples/wordcount/

The jar used for the job, in this case wc.jar should reside locally and on execution provide the absolute or the relative local path to the command.

So the final command would look like

hadoop jar /path/where/the/jar/is/wc.jar ClassName /mapreduce/wordcount/input/ /examples/wordcount/output/
franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • 1
    Thanks for the clear explanation. It really helped me understand the problem and yes, the solution worked and better yet,I understand what I was doing wrong and know better going forward. – Slinky Feb 04 '17 at 15:23