0

In all the HBase articles and books it mentions the following about the Meta and FileInfo blocks in HFiles:-

"The Meta block is designed to keep a large amount of data with its key as a String, while FileInfo is a simple Map preferred for small information with keys and values that are both byte-array. " OR "Metadata blocks are expensive. Fill one with a bunch of serialized data rather than do a metadata block per metadata instance. If metadata is small, consider adding to file info"

I want to understand why it says that. What is the design logic because of which large data should be kept in Meta while small in FileInfo.

The reason I want to know this is that we store some information in the FileInfo in our project. However, over time the information we store started growing and we now have upto 15-20MB of data in FileInfo. From the above text it seems we should not be doing this. But we don't even know what impact, if any, it is causing to our system.

Can someone please shed some light on this. I've looked at the HFile and FileInfo code and couldn't find any obvious reason.

Peter Haddad
  • 78,874
  • 25
  • 140
  • 134
anuragz
  • 63
  • 9

1 Answers1

0

Seems like it was a silly question after all, but the reason for this is that FileInfo block is what is called as "load-on-open" which as the name suggests, gets loaded in file open itself. So if you have a large data in FileInfo then even if you don't need it, still it will get loaded in memory. On the other hand, meta block can be loaded on demand. Hence it's better that if you have large data you should consider putting it in Meta instead of FileInfo.

anuragz
  • 63
  • 9