1

The hbase writes the record updates (for a row key RK1) to Hfile. However one of the older Hfile will contain references to this rowkey RK1. How and when is this older reference to this RK1 invalidated ?

Assume there is Hfile containing the record for rowkey RK1. Then this RK1 is updated which means this update is written to a new HFile. The older Hfile containing reference the RK1 must be invalidated. How and when is this done in Hbase ?

Thanks.

Seeker
  • 45
  • 5

1 Answers1

0

In HDFS files are immutable objects, so both files old and new will be keep a reference RK1. Not to keep a large amount of HFile in HDFS, HBase periodically does a compaction job: mergers old small HFiles into new big one and delete old small HFile. Reference to RK1 will be in HFile until the compaction with files is happened. There are no guaranty for this, during a minor compaction, that running only on several HFiles. Major compaction mergers all files. To enforce the old values deletion, you should trigger a major compaction. Be careful with major compaction, for huge table it runs for hours.

Alexander Kuznetsov
  • 3,062
  • 2
  • 25
  • 29
  • Thanks, how will a read operation on RK1 work then as there are two references to the same RK1 in Hfiles, Is the older reference to RK1 in hfile invalidated on the arrival of an update to RK1, so that subsequent reads for RK1 are directed to new reference. – Seeker Nov 12 '15 at 11:11
  • HBase store a version information. It will read from both files compare a version info and choose latest version. – Alexander Kuznetsov Nov 12 '15 at 12:33
  • 1
    So it means, regular compaction can reduce read latency to some extent. – Seeker Nov 13 '15 at 10:19