0

I am learning Hadoop recently. I am using sandbox on virtualbox. I downloaded a python script with mrjob frame and run the following command,

python RatingsBreakdown.py -r hadoop --hadoop-streaming-jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming-jar u.data

and then got this,

Running step 1 of 1...
Not a valid JAR: /usr/hdp/2.6.3.0-235/hadoop-mapreduce/hadoop-streaming-jar
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Jacob
  • 1
  • 1
  • 1
    You didn't give an actual jar file. Typically those end in `.jar`, not `-jar`... Please verify the file path you gave actually exists – OneCricketeer Jan 28 '18 at 15:36

1 Answers1

0
lib/hadoop-mapreduce/hadoop-streaming.jar  

This is the jar in my computer , a valid jar is end with .jar your command is has some mistakes . you can open the folder to observe (cd foldername) the filename or try to use tab to completion your file name .In that way to reduce mistakes.

HbnKing
  • 1,762
  • 1
  • 11
  • 25
  • Hi, thanks for your help! You're right. I tried to find a file named **"hadoop-streaming*.jar"** in hadoop folder and I found one. I made an alias. But the following step still doesn't work. The command I use is **Python name.py u.data -r hadoop --hadoop-streaming-jar HADOOP-STREAM > result.out** where name.py is my mrjob and u.data is the data file. I got a lot STDERR. it shows: Creating temp directory -> Copying local files to hdfs -> STDERR Unable to load native-hadoop library -> and more. BTW it works on local but hadoop. – Jacob Jan 31 '18 at 01:04
  • @Jacob That must be another question , had you use **-file or -files** arguments? I dont know more about Python ! – HbnKing Feb 05 '18 at 06:12