Questions tagged [hbase]

HBase is the Hadoop database (columnar). Use it when you need random, real time read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

HBase is an open source, non-relational, distributed,versioned, column-oriented database modeled after Google's Bigtable and is written in Java. Bigtable: A Distributed Storage System for Structured by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop Distributed File System(HDFS). HBase includes: It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System), providing Bigtable-like capabilities for Hadoop.

  • Convenient base classes for backing Hadoop MapReduce jobs with HBase tables including cascading, hive and pig source and sink modules
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
6961 questions
2
votes
1 answer

Defining Hive external table on top of HBase existing table

There is an empty HBase table with two column families: create 'emp', 'personal_data', 'professional_data' Now I am trying to map a Hive external table to it, which would naturally have some columns: CREATE EXTERNAL TABLE emp(id int, city string,…
Denys
  • 4,287
  • 8
  • 50
  • 80
2
votes
1 answer

Spark: Read HBase in secured cluster

I have an easy task: I want to read HBase data in a Kerberos secured cluster. So far I tried 2 approaches: sc.newAPIHadoopRDD(): here I don't know how to handle the kerberos authentication create a HBase connection from the HBase API: Here I don't…
Daniel
  • 2,409
  • 2
  • 26
  • 42
2
votes
1 answer

Spark Cluster Driver fails with error -

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of…
user3300673
  • 21
  • 1
  • 2
2
votes
0 answers

HTable is not getting invoked through JAVA API

First of all thanks for viewing my question. Currently, I have installed hbase-1.2.3 and hadoop-2.7.3 in a Linux box. When executing the jps, I could see the below process running, 11015 HQuorumPeer 25843 NameNode 26173 SecondaryNameNode 25985…
2
votes
1 answer

HBase on Hadoop, data locality deep diving

I have read multiple articles about how HBase gain data locality i.e link or HBase the Definitive guide book. I have understood that when re-writing HFile, Hadoop would write the blocks on the same machine which is actually the same Region Server…
David H
  • 1,346
  • 3
  • 16
  • 29
2
votes
2 answers

REST API for processing data stored in hbase

I have a lot of records in hbase store (millions) like this key = user_id:service_id:usage_timestamp value = some_int That means an user used some service_id for some_int at usage_timestamp. And now I wanted to provide some rest api for…
Normal
  • 1,347
  • 4
  • 17
  • 34
2
votes
1 answer

Search for the latest rows in terms of timestamp

I am looking for how to search for the latest rows in hbase table that is loaded by Nutch 2.3. I use happybase and thrift, the only example I have found is in this link…
Hakim
  • 21
  • 1
  • 4
2
votes
1 answer

How can we automate the incremental import in Sqoop from DB to HBase using linux script

Using sqoop job we can do the incremental load to HBase using the --lastval But how we can do the same with shell script and how we will get the --lastval when we automate the script ? I mean how to store the --lastval and how to pass it to the next…
Raj
  • 537
  • 4
  • 9
  • 18
2
votes
1 answer

How to decrease full table scan impact on Hbase cluster?

Is there any possibility to limit poor query's impact on Hbase cluster? If yes, what needs to be achieved? Do I need kerberos to identify users and limit their query's impact or to assign resources to them? Poor query's from phoenix can kill the…
user5688790
2
votes
1 answer

how to control the number of mappers per region server for reading a HBase table

I have a HBase Table(Written through Apache Phoenix) , That needs to be read and write to a Flat Text File. Current Bottleneck is as we have 32 salt buckets for that HBase(Phoenix) table it opens only 32 mappers to read. And when the data grows over…
2
votes
1 answer

Issue running Hive on spark with Hive table view mapping with Hbase table : java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.lazy

I am trying to access Hbase table by mapping it from hive through spark engine. From Hive: When I run the query on Hive view mapped with Hbase I could get all the desired result. From Spark: When i run query to fetch from hive table ,i could get it…
Ranjan Swain
  • 75
  • 1
  • 9
2
votes
1 answer

RuntimeException MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe

On HDP cluster, I am trying create and integrate Hive tables with existing Hbase tables. It creates the hive table. But when I am try to query the hive table it throws the following exception especially when the number of columns exceeds 200. I…
Sangita Satapathy
  • 893
  • 1
  • 6
  • 9
2
votes
2 answers

How to send some data to Mapper class (running on data in a HBase database)

I need to send some information for mapper jobs running on nodes in HBase. I already have defined the data as static member in the class but it seems when the mapper is being run on the other nodes, the data is not being transferred to the nodes. Is…
mmostajab
  • 1,937
  • 1
  • 16
  • 37
2
votes
2 answers

Connect HBase to Grafana

How can HBase be configured as a datasource in Grafana? Is it possible through http api? How to integrate Apache HBase or Spark with Grafana as a reliable datasource?
Krish
  • 31
  • 1
  • 4
2
votes
3 answers

Storing and processing timeseries with Hadoop

I would like to store a large amount of timeseries from devices. Also these timeseries have to be validated, can be modified by an operator and have to be exported to other systems. Holes in the timeseries must be found. Timeseries must be shown in…
Pablo Castilla
  • 2,723
  • 2
  • 28
  • 33