1

I'm trying to install a custom Hadoop implementation (>2.0) on Google Compute Engine using the command line option. The modified parameters of my bdutil_env.sh file are as follows:

GCE_IMAGE='ubuntu-14-04'
GCE_MACHINE_TYPE='n1-standard-1'
GCE_ZONE='us-central1-a'
DEFAULT_FS='hdfs'
HADOOP_TARBALL_URI='gs://<mybucket>/<my_hadoop_tar.gz>'

The ./bdutil deploy fails with a exit code 1. I find the following errors in the resultant debug.info file:

    ssh: connect to host 130.211.161.181 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ssh: connect to host 104.197.63.39 port 22: Connection refused
ssh: connect to host 104.197.7.106 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
.....
.....
Connection to 104.197.7.106 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
Connection to 104.197.63.39 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
Connection to 130.211.161.181 closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [123].
...
...
hadoop-w-1: ==> deploy-core-setup_deploy.stderr <==
....
....
hadoop-w-1: dpkg-query: package 'libsnappy1' is not installed and no information is available
hadoop-w-1: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-1: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-1: dpkg-preconfigure: unable to re-open stdin: No such file or directory
hadoop-w-1: dpkg-query: package 'libsnappy-dev' is not installed and no information is available
hadoop-w-1: Use dpkg --info (= dpkg-deb --info) to examine archive files,
hadoop-w-1: and dpkg --contents (= dpkg-deb --contents) to list their contents.
hadoop-w-1: dpkg-preconfigure: unable to re-open stdin: No such file or directory
hadoop-w-1: ./hadoop-env-setup.sh: line 612: Package:: command not found
....
....
hadoop-w-1: find: `/home/hadoop/hadoop-install/lib': No such file or directory

I don't understand why the initial ssh error is given; I can see the VMs and login to them properly from the UI; my tar.gz is also copied in the proper places.

I also do not understand why libsnappy wasn't installed; is there anything particular I need to do? The shell scripts seem to be having commands to install it, but it's failing somehow.

I checked all the VMs; Hadoop is not up.

EDIT : For solving the ssh problem, I ran the following command:

gcutil --project= addfirewall --allowed=tcp:22 default-ssh

It made no difference.

2 Answers2

1

In this case, the ssh and libsnappy errors are red herrings; when the VMs weren't immediately SSH-able, bdutil polled for awhile until it should've printed out something like:

...Thu May 14 16:52:23 PDT 2015: Waiting on async 'wait_for_ssh' jobs to finish. Might take a while...
...
Thu May 14 16:52:33 PDT 2015: Instances all ssh-able

Likewise, the libsnappy error you saw was a red herring because it's coming from a call to dpkg -s trying to determine whether a package is indeed installed, and if not, to apt-get install it: https://github.com/GoogleCloudPlatform/bdutil/blob/master/libexec/bdutil_helpers.sh#L163

We'll work on cleaning up these error messages since they can be misleading. In the meantime, the main issue here is that Ubuntu hasn't historically been one of the supported images for bdutil; we thoroughly validate CentOS and Debian images, but not Ubuntu images, since they were only added as GCE options in November 2014. Your deployment should work fine with your custom tarball for any debian-7 or centos-6 image. We've filed an issue on GitHub to track Ubuntu support for bdutil: https://github.com/GoogleCloudPlatform/bdutil/issues/29

EDIT: The issue has been resolved with Ubuntu now supported at head in the master repository; you can download at this most recent commit here.

Dennis Huo
  • 10,517
  • 27
  • 43
  • Thanks for clearing this up. It would be really helpful if you mention on the bdutil documentation page about tested support only for CentOS and Debian (and now Ubuntu), might make it easier for other people later. Our Hadoop tarball was built and internally tested only on Ubuntu, which was why we set our Google instance image to be Ubuntu as well. – darshanvalia May 26 '15 at 05:05
  • I see this now as a newbie mistake; will try and use either Debian/CentOS images for all cloud projects from now on. – darshanvalia May 26 '15 at 05:16
0

Looking at your error code, it seems like you have to download snappy libraries in your classpath. If you are using java then you can download your libraries from this path https://github.com/xerial/snappy-java. OR try this link https://code.google.com/p/snappy/.

salmanbw
  • 1,301
  • 2
  • 17
  • 23
  • I took a look at the bdutil scripts; it seems that snappy should have been installed through the deployment script itself. The failure led to Hadoop not being started on any machine; it would be better if I don't have to go into the VMs themselves and install it one by one. – darshanvalia Apr 30 '15 at 20:33