Questions tagged [hbase]

HBase is the Hadoop database (columnar). Use it when you need random, real time read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

HBase is an open source, non-relational, distributed,versioned, column-oriented database modeled after Google's Bigtable and is written in Java. Bigtable: A Distributed Storage System for Structured by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Distributed File System(HDFS). HBase includes: It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing Bigtable-like capabilities for Hadoop.

  • Convenient base classes for backing Hadoop MapReduce jobs with HBase tables including cascading, hive and pig source and sink modules
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
6961 questions
21
votes
5 answers

What is meant by sparse data/ datastore/ database?

Have been reading up on Hadoop and HBase lately, and came across this term- HBase is an open-source, distributed, sparse, column-oriented store... What do they mean by sparse? Does it have something to do with a sparse matrix? I am guessing it is…
Jai
  • 3,549
  • 3
  • 23
  • 31
21
votes
1 answer

HBase: How does replication work?

I'm currently evaluating HBase as a Datastore, but one question was left unanswered: HBase stores many copies of the same object on many nodes (aka replication). As HBase features so-called strong consistency (in constrast to eventual consistent) it…
theomega
  • 31,591
  • 21
  • 89
  • 127
21
votes
12 answers

HBase standalone failed to connect (fail to create table)

I am trying to deploy Hbase in standalone mode following this article: http://hbase.apache.org/book.html#quickstart. The version is 0.92.1-cdh4.1.2 But I am getting these errors when try to create a table: Error message: 13/04/01 14:07:10 ERROR…
Hypnos
  • 285
  • 1
  • 3
  • 10
20
votes
1 answer

Storing data in HBase vs Parquet files

I am new to big data and am trying to understand the various ways of persisting and retrieving data. I understand both Parquet and HBase are column oriented storage formats but Parquet is a file oriented storage and not a database unlike HBase. My…
sovan
  • 363
  • 1
  • 4
  • 13
20
votes
2 answers

What exactly is the zookeeper quorum setting in hbase-site.xml?

What exactly is the zookeeper quorum setting in hbase-site.xml?
raj
  • 3,769
  • 4
  • 25
  • 43
20
votes
3 answers

How to add new column family to an existing HBase table?

I created a table by create 'tablename', 'columnfamily1' Now is it possible to add another column family 'columnfamily2'? What is the method?
proutray
  • 1,943
  • 3
  • 30
  • 48
20
votes
3 answers

How do I determine the size of my HBase Tables ?. Is there any command to do so?

I have multiple tables on my Hbase shell that I would like to copy onto my file system. Some tables exceed 100gb. However, I only have 55gb free space left in my local file system. Therefore, I would like to know the size of my hbase tables so that…
gautham
  • 313
  • 1
  • 2
  • 6
20
votes
2 answers

What is meant by "HDFS lacks random read and write access"?

Any file system should provide an API to access its files and directories, etc. So, what is meant by "HDFS lacks random read and write access"? So, we should use HBase.
lovespring
  • 19,051
  • 42
  • 103
  • 153
20
votes
1 answer

Role of datanode, regionserver in Hbase-hadoop integration

From my understanding rows are inserted into HBase tables and are getting stored as regions in different region server. So, the region server stores the data Similarly in terms of Hadoop, data is stored in the data nodes present in the hadoop…
Manikandan Kannan
  • 8,684
  • 15
  • 44
  • 65
20
votes
1 answer

Confusion over hadoop job tracker api

I'm try to collect some information from the job tracker. For starters I'd like to start with getting running jobs info such as job id or job name etc. But already stuck, here is what I've got (prints out job ids for currently running jobs): public…
Gandalf StormCrow
  • 25,788
  • 70
  • 174
  • 263
19
votes
6 answers

Hbase / Hadoop Query Help

I'm working on a project with a friend that will utilize Hbase to store it's data. Are there any good query examples? I seem to be writing a ton of Java code to iterate through lists of RowResult's when, in SQL land, I could write a simple query. …
zechariahs
  • 533
  • 2
  • 4
  • 14
19
votes
1 answer

AWS DynamoDB VS HBase

I have been using HBase for the past six months and I came to know about DynamoDB by Amazon. Maintenance wise dynamo db looks easier to handle since its taken care by Amazon. But whether to switch to dynamo db from hbase is a question to me. I…
dharshan
  • 733
  • 4
  • 11
  • 24
18
votes
2 answers

What is the fastest way to bulk load data into HBase programmatically?

I have a Plain text file with possibly millions of lines which needs custom parsing and I want to load it into an HBase table as fast as possible (using Hadoop or HBase Java client). My current solution is based on a MapReduce job without the Reduce…
Cihan Keser
  • 3,190
  • 4
  • 30
  • 43
18
votes
3 answers

The type HTable(config,tablename) is deprecated. What use instead?

What can I use instead of HTable(config,tablename)? This method is deprecated. In every example I could find they use this or another Constuctor, which is also deprecated.
dino
  • 239
  • 3
  • 12
18
votes
4 answers

Can not read large data from phoenix table

Hi All i am getting below error message while running phoenix count query on a large table. 0: jdbc:phoenix:hadoopm1:2181> select Count(*) from PJM_DATASET; +------------+ | COUNT(1) | +------------+ java.lang.RuntimeException:…
user3683741
  • 181
  • 1
  • 5