Questions tagged [hfile]

File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.

File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.

In HBase 0.20, MapFile is replaced by HFile: a specific map file implementation for HBase. The idea is quite similar to MapFile, but it adds more features than just a plain key/value file. Features such as support for metadata and the index is now kept in the same file.

In HBase 0.92, HFile v2 features improved speed, memory, and cache usage.

Blog: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/

Class: https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/HFile.html

36 questions
1
vote
1 answer

Spark job failed due to not serializable objects

I'm running a spark job to generate HFiles for my HBase data store. It used to be working fine with my Cloudera cluster, but when we switched to EMR cluster, it fails with following stacktrace: Serialization stack: - object not serializable…
Fisher Coder
  • 3,278
  • 12
  • 49
  • 84
1
vote
1 answer

Cannot run Spark jobs for large datasets

I wrote a Spark job to read from Hive data in S3 and generate HFiles. This job works fine when reading only one ORC file (about 190 MB), however, when I used it to read the entire S3 directory, about 400 ORC files, so about 400*190 MB = 76 GB data,…
Fisher Coder
  • 3,278
  • 12
  • 49
  • 84
1
vote
1 answer

When are the references to a row key in older Hfile removed or invalidated?

The hbase writes the record updates (for a row key RK1) to Hfile. However one of the older Hfile will contain references to this rowkey RK1. How and when is this older reference to this RK1 invalidated ? Assume there is Hfile containing the record…
Seeker
  • 45
  • 5
1
vote
1 answer

What's the relationship between hadoop's TFile and HFile?

It seems hadoop support both TFile and HFile. I'd like to know the difference between then and how they appeared (e.g. was HFile derived from TFile?).
Igor Gatis
  • 4,648
  • 10
  • 43
  • 66
0
votes
1 answer

Hbase bulk load HFiles periodically and minor compaction relation

I have scenario where we have to periodically load HFiles to HBase table on dialy basis. HFile size for each run could be between 50 to 150 MB per region . These load could be 12 times a day as well as in some cases every 15 minutes. While doing…
Ramdev Sharma
  • 974
  • 1
  • 12
  • 17
0
votes
1 answer

Does Hbase create a HFile for each column-family or columnFamily:Column?

I am trying to understand the Hbase architecture with respect to Logical data model vs Physical data storage. I am little confused about the HFile creation. If we have a column family with 2 columns, does Hbase create 2 HFiles or just one? Below is…
AnswerSeeker
  • 203
  • 4
  • 16
0
votes
2 answers

linking files in c( multiple definition of...)

Im trying to link a few files in c and im getting this erorr: "multiple definition of createStudentList" my main.c: #include "students.h" int main(void) { return 0; } students.h: #ifndef _students_h_ #define _students_h_ #include…
zoids3
  • 65
  • 10
0
votes
1 answer

Run LoadIncrementalHFiles from Java client

I want to call hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/myuser/map_data/hfiles mytable method from my Java client code. When I run the application I get the following…
D. Müller
  • 3,336
  • 4
  • 36
  • 84
0
votes
1 answer

Bulk loading with LoadIncrementalHFiles and subdirectories

I wrote a Spark application that generates HFiles to be used for bulk loading with the LoadIncrementalHFiles command later. As the source data pool is very big, the input files are splitted into iterations that are processed one after the other.…
D. Müller
  • 3,336
  • 4
  • 36
  • 84
0
votes
2 answers

Spark - Create HFile for one rowKey with multiple columns

JavaRDD hbaseFile = jsc.textFile(HDFS_MASTER+HBASE_FILE); JavaPairRDD putJavaRDD = hbaseFile.mapToPair(line -> convertToKVCol1(line,…
徐琮杰
  • 81
  • 1
  • 5
0
votes
1 answer

HBase FileInfo block

In all the HBase articles and books it mentions the following about the Meta and FileInfo blocks in HFiles:- "The Meta block is designed to keep a large amount of data with its key as a String, while FileInfo is a simple Map preferred for small…
anuragz
  • 63
  • 9
0
votes
1 answer

Load data via HFile into HBase not working

I wrote a mapper to load data from disk via HFile into HBase, the program runs successfully, but there's no data loaded in my HBase table, any ideas on this please? Here's my java program: protected void writeToHBaseViaHFile() throws Exception { …
Fisher Coder
  • 3,278
  • 12
  • 49
  • 84
0
votes
1 answer

Any ideas on how to bulk loading protocol buffer file via HFile onto HBase?

Here's what I'm trying to do: Load data from Hive into HBase serialized by protocol buffers. I've tried multiple ways: create connections directly to HBase and do Puts into HBase. This works, but apparently not very efficient. I imported the json…
Fisher Coder
  • 3,278
  • 12
  • 49
  • 84
0
votes
0 answers

Automating hbase upgrade

I have an appliance with a standalone hbase server which stores data on the file system. It is running hbase version 0.94.17. This is basically used to support Open TSDB. I am trying to automate the process of upgrade to 1.2.4. The data does not…
0
votes
1 answer

hbase NameError: uninitialized constant IS_MOB

I am a user of hbase-0.98.18-hadoop2, when I try to create a table: create 'MOBTable', {NAME => 'columFamily', VERSION => 1, IS_MOB => true, MOB_THRESHOLD => 102400} there is an error: NameError: uninitialized constant IS_MOB But I have add the…
Dehai Chen
  • 60
  • 7