14

I know that I can spin off a EC2 cluster with Hadoop installed (unless I am wrong about that). How about Hbase? Can I have the Hadoop and Hbase premade, ready to go? Or do I need to get my hands dirty. If it is not an option, what is the best option? Cloudera apparently has a package with both. Is that the way to go?

Thanks for the help.

delmet
  • 1,013
  • 2
  • 9
  • 23
  • You can have whatever you want... spin up a server, install whatever you want on it, create an image and save it. Then you can launch infinite copies of that server with the software already installed. – Dan Grossman Feb 25 '11 at 03:29
  • While you can do this, my answer below has a pre-made image ready, but you might want to save your own version of it so you can always access it, just incase the other one is deleted. – Mike Mar 02 '11 at 19:18
  • check this link, may be its help full http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase-launch.html – mst Sep 02 '15 at 16:07

3 Answers3

7

hbase has a set of ec2 scripts which get you setup and ready to go very quickly. It lets you configure the number of zk servers as well as slave nodes, but I'm not sure in which versions they are available. I'm using 0.20.6. After setting up some of your S3/EC2 information, you can do things like:

/usr/local/hbase-0.20.6/contrib/ec2/bin/launch-hbase-cluster CLUSTERNAME SLAVES ZKSERVERS

to quickly start using the cluster. It's nice because it'll install LZO information for you, as well.

Here are some params from the environment file in the bin directory that might be useful (if you want a 20.6 AMI):

# The version of HBase to use.
HBASE_VERSION=0.20.6

# The version of Hadoop to use.
HADOOP_VERSION=0.20.2

# The Amazon S3 bucket where the HBase AMI is stored.
# Change this value only if you are creating your own (private) AMI
# so you can store it in a bucket you own.
#S3_BUCKET=apache-hbase-images
S3_BUCKET=720040977164

# Enable public access web interfaces
ENABLE_WEB_PORTS=false

# Extra packages
# Allows you to add a private Yum repo and pull packages from it as your
# instances boot up. Format is <repo-descriptor-URL> <pkg1> ... <pkgN>
# The repository descriptor will be fetched into /etc/yum/repos.d.
EXTRA_PACKAGES=

# Use only c1.xlarge unless you know what you are doing
MASTER_INSTANCE_TYPE=${MASTER_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.xlarge unless you know what you are doing
SLAVE_INSTANCE_TYPE=${SLAVE_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.medium unless you know what you are doing
ZOO_INSTANCE_TYPE=${ZOO_INSTANCE_TYPE:-c1.medium}

You also might need to set your java version if JAVA_HOME is not set in the ami (and I don't think it is). Newer versions of hbase are probably available in S3 buckets, just do a describe instances and grep for hadoop/hbase to narrow the results.

Mike
  • 611
  • 4
  • 12
4

From what I heard, the easiest and fastest way to get hbase running on EC2 is using apache whirr.

André
  • 12,971
  • 3
  • 33
  • 45
2

Are you aware of Amazon Elastic MapReduce? It doesn't offer Hbase but it offers plain 'ol Hadoop, Hive and Pig (in fairly recent versions). Big win is that they don't start charging you until 90% of your nodes are up, downside is that there is a slight premium per hour over normal EC2.

If you really need/want to use HBase then you may be better off spinning something up yourself. See the following Cloudera blog post for a discussion of Hive and Hbase integration: http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/

Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • 1
    We have decided to go with EMR. It is easy to use, that is for sure. I am deferring the HBase until later. MySql seems fine for the time being. Sooner or later I will have to look into that; hopefully, by then EC2 will have a hive offering. – delmet Mar 24 '11 at 05:20
  • 1
    Good stuff, thanks for the update. Let us know how you find it. – Joe Harris Mar 28 '11 at 13:07
  • 1
    You have probably seen this, but EMR now offers HBase and Hive. – prestomation Oct 08 '12 at 18:21