1

I want to run the following command:

hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input

It works when I call it from the shell after I ssh onto the master node. But it will not work if I try to call it through ssh as follows:

ssh -i /home/USER/keypair.pem hadoop@ec2-XXXX.compute-1.amazonaws.com hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input

It gives the error:

zsh: command not found: hadoop

But if I take out the last pipe the command succeeds:

ssh -i /home/USER/keypair.pem hadoop@ec2-XXXX.compute-1.amazonaws.com hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz"

From some searching I've found that it may be due to an error with the JAVA_HOME not being set, but it is set correctly in ~/.bashrc on the master node

The hadoop clustter is an Amazon Elastic Map Reduce cluster.

Shane
  • 2,315
  • 3
  • 21
  • 33
  • Are you sure that the whole command chain gets passed to ssh? Because from the error message, it looks like you just execute `hadoop fs -ls hdfs:///logs/` on the remote host and pipe the output of ssh though grep. – Carsten Feb 07 '13 at 11:48
  • Ah, i think that is what's happening. How could I change the command to be piped only on the remote host? – Shane Feb 07 '13 at 11:51

1 Answers1

3

Only the first command of your piped command chain gets executed on the reomte host. The rest happens locally at your computer. So, of course, if you don't have hadoop installed, zsh will print out an error message (and otherwise, it would just put it onto your local Hadoop, which is probably not what you want.

To pass all commands to ssh, you can put them in quotes "" or single quotes '':

ssh -i /home/USER/keypair.pem hadoop@ec2-XXXX.compute-1.amazonaws.com 'hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input'
Carsten
  • 17,991
  • 4
  • 48
  • 53
  • Thanks. I was actually using the elastic-mapreduce command line to pass the command to ssh, which turns out has a bug in it where it takes out the quotes I had in. – Shane Feb 07 '13 at 11:59
  • You should also give the entire path to the hadoop command(eg. home/hadoop/bin/hadoop) on the shell script, to avoid the command not found – viper Sep 13 '13 at 17:34