2

Do you use compression with your indexes tables in Hbase? If so, what type of compression do you use?

I have noticed that the size of my indexes tables are every big, and grow each day... After adding new storage, the size is even bigger.

I have e.g table A with the size of 108.3 G

In /apps/hbase/data/data/default, Index table with size of 380.0 G,

and in /apps/hbase/data/archive/data/default, Indexe table with size of 1.2 T

Could you advice me what to do with the size of index tables?

Why the data in archive on HDFS is so big? /apps/hbase/data/archive/data/default

Could the size of archive catalog on HDFS be managed in some way? Archive takes more that 2/3 of my HDFS space.

I have notice also, that I have on three tables more than a hundred 'split regions', other tables do not have 'split regions'. Do you know what could be the reason?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121

2 Answers2

3

I found on stage environment, that the reason of large data in /apps/hbase/data/archive/ are caused by daily hbase snapshots that are running from cron.

So, now I will rewrite the script, and keep only one or two table snapshots.

0

Yes I used snappy like this...

 create 't1', { NAME => 'cf1', COMPRESSION => 'SNAPPY' }

Compression support Check

Use CompressionTest to verify snappy support is enabled and the libs can be loaded ON ALL NODES of your cluster:

$ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy

For most of your questions above.. compression would help. Also look at my answer. how it helped

I have notice also, that I have on three tables more than a hundred 'split regions', other tables do not have 'split regions'. Do you know what could be the reason?

  • Make sure that pre-split the table between finite number for example 0-9.
  • run compaction over table regions.
Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Thanks, I will add the snappy compression to index tables. I just wanted to know if it is common practise in large environments with index tables. –  Dec 14 '16 at 11:05