0

Can anybody please help me in getting connection to Hbase (which is running on Amazon EMR) programmatically in ruby?

Actually, I want to import bulk data into Hbase column oriented table on Amazon EMR and retrieve the same data with aggregations/group by kind of queries programmatically.

I've gone through https://github.com/aws/aws-sdk-ruby, https://github.com/CompanyBook/massive_record and hbase-stargate gems, but, it seems none of them has clear explanation with examples.

Thanks in advance..

1 Answers1

2

Thrift is the way that most people access HBase outside of the jvm. The massive_record gem you linked uses the thrift bindings. So spin up a thrift server that points at your emr cluster and then point your ruby client at the thrift server.

eclark
  • 678
  • 4
  • 6
  • To add to this answer, two additional points (1) Thrift port is 9090 by default, especially for Amazon EMR (2) make sure to open your 9090 ports on your EMR security group to the IP of the machine you are using to access HBase. – Suman Nov 20 '12 at 18:04
  • @eclark : Thanks for your response.Since I'm new to Amazon web services like EMR and Thrift server, can you please elaborate the answer or give me the links where I can find detailed explanation ? Thanks again in advance. – user1509711 Nov 20 '12 at 19:55
  • @Suman : Thanks for your response.Since I'm new to Amazon web services like EMR and Thrift server, can you please elaborate the answer or give me the links where I can find detailed explanation ? Thanks again in advance. – user1509711 Nov 20 '12 at 19:55
  • NP. But its way too long for a SO reply. :/ I am planning to write a blog post about it soon, will post a link here when I'm done. – Suman Nov 20 '12 at 19:58
  • @Suman : Please tell me whether my understanding is correct or not. I think, I need to create a hive Job flow on Amazon EMR and use that server as a Thrift server to point hbase emr cluster and then point massive_record ruby client at the thrift server. If so, can you please let me know the way to open 9090 port on EMR security group. – user1509711 Nov 21 '12 at 19:25
  • I think that makes sense, though I haven't used Hive to input to HBase. You have to add port 9090 to your EC2 security group for "ElasticMapReduce-master" group, and also run "hbase-daemon.sh start thrift" on your HBase master node to get this to work. – Suman Nov 21 '12 at 19:27
  • @Suman : In eclark's answer, he mentioned "spin up a thrift server that points at your emr cluster". In fact, massive_record ruby gem uses Thrift connection to connect Hbase running on EMR. Here, which one acts as a thrift server ? Does Hbase server itself act as thrift server or Do I need to create Hive job flow and make that server to act as thrift server ? – user1509711 Nov 21 '12 at 20:01
  • 2
    How to start up a HBase cluster with Thrift on EMR: http://sumanrs.wordpress.com/2012/11/26/gentle-introduction-hbase-amazon-elastic-map-reduce-emr-hbase-cluster/ – Suman Nov 26 '12 at 22:52
  • I wrote an extensive post on this precise subject, with working examples and step by step instructions, **using HBase on Ruby via Thrift**: [http://www.jlescure.com/blog/hadoop-for-rubyists-quick-intro-to-hbase/](http://www.jlescure.com/blog/hadoop-for-rubyists-quick-intro-to-hbase/) – JeanLescure Apr 17 '14 at 16:15