HDFS file paths in HBase

Question

We have a source of files each of from few MB to few GB in size. Each file is uniquely named and could be mapped to a person. However person information comes from different sources but it is not in the file system.

Now, we have a requirement to move all files to HDFS and build UI to add person information to the file and search for files based on person information later.

I am thinking to move files using WebHDFS (so that we could secure the cluster using knox) every night and build UI to add person information to the HBase and link person to the appropriate file (User could map file name with the person). Each HBase record will have the person information and the path of the hdfs file.

I am wondering if the above architecture has any bad implications. Is it okay to have HDFS file paths in the HBase records?

Are you sure you need HBase for that? Won't everything fit to a regular database (e.g. MySQL)? — facha, Feb 02 '16 at 22:17
@facha Person data would be different depending on the source. So, we considered Mongo first. However, we thought HBase could be helpful if we want to implement analytic usecases that require both files and person information. — user3600073, Feb 02 '16 at 22:29
if people count in tables will not more than a million, i think mongodb is easy way to search based on different search fields.you describe a typical json format data. — halil, Feb 03 '16 at 07:31
If HBase is a good choise or not, depends on how you are going to use it. HBase performs well on "find a needle in a haystack" type of queries (lookups of a single row of data). HBase performs poorly on analytical queries (where you need to scan all dataset and aggregate it in some way). — facha, Feb 03 '16 at 07:41
thanks for the response. Is it usual to have hdfs file paths in some databases that are used for oltp? — user3600073, Feb 03 '16 at 12:42

HDFS file paths in HBase

0 Answers0