1

I have read that in hbase, I should not have more than 2-3 column families in a table. I know that to fetch entries from each column family in a row, I need to do a separate scan, but still I don't understand what will be issue if I will have more column families. ?

In my case I want to store 20 images of around 10 KB each in hbase having different dimension(m x n). So generally a request came to fetch for particular dimension and I need to serve that image. So if I will put all these images in single column family, all 20 images of different dimension will unnecessarily get loaded into memory for caching(if the request will come again for same image, it is sure that it will come for same dimension). On the other hand if I will keep 20 column families(one for each dimension), only the required image will get loaded into RAM for caching.

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
Harsh Sharma
  • 10,942
  • 2
  • 18
  • 29

1 Answers1

0

I would suggest to try to store different dimensions in different columns within the same row, and whenever there is a request for particular dimension you do a get adding only the required dimension column. Since HBase caches data by blocks (BlockCache) and not the single value, 64K block (by default) containing the required data will be cached.

Having more then 2-3 column families per table, will run you to some performance problems as explained here.

Hope this will help.

Alexander
  • 398
  • 4
  • 8
  • By different columns you mean different columns in same column family. ? If yes, hbase put full column family in the memory(cache) and unnecessarily all my images of every dimension in same row will get cached. That the problem I already mentioned in the question. – Harsh Sharma Feb 26 '15 at 07:38
  • HBase will put the full column family in memory if it is specified to do so during the creation time. I thought that the "cache" you mentioned in the question was the BlockCache. – Alexander Feb 26 '15 at 08:07
  • By "cache" I mean block cache only. Can I get only selected columns of a column family in block cache ? I thought if I am doing a get call for a particular column of a column family, the whole column family get cached in the block cache. Isn't that right ? – Harsh Sharma Feb 26 '15 at 08:22
  • From Hbase administration cookbook: HBase supports block cache to improve read performance. When performing a scan, if block cache is enabled and there is room remaining, data blocks read from StoreFiles on HDFS are cached in region server's Java heap space, so that next time, accessing data in the same block can be served by the cached block. Block cache helps in reducing disk I/O for retrieving data. So HBase will cache the whole block containing the requested data. No the single value that you request. – Alexander Feb 26 '15 at 08:52