0

I've been working on a small task of converting and loading hive data to HFiles in HBase; framework MapR. Using bulkload I'm loading the data after conversion in HFiles. There isn't any issue with conversion, the conversion is going fine. The only issue I'm facing is MR job failure as and when the size of hive data increases. The job fails because of virtual memory getting filled up. The job breaks if the hive data size limit crosses 10Gigs.

All data is moved into single region server instead getting distributed on multiple region servers; it's a 10 node cluster I'm working on. It seems there is hbase hotspotting.

I've tried splitting the regions in multiples(NUMREGIONS => 256) and distributing the load equally (SPLITALGO => 'UniformSplit') among the regions. But it doesn't resolve the issue. Anybody got any idea how to resolve this hotspotting issue??

Regards, Adil

knowone
  • 840
  • 2
  • 16
  • 37
  • I have a similar requirement and will try to look into your issue too. can you post any useful links which you have used for the above implementation. Thanks in advance – Ramzy May 30 '15 at 05:39
  • Not any specific link Ramzy, simple java routines. Hive & HBase are only used for table creation, insertion & deletion when needed; every other task is achieved with java routines only. – knowone Jun 02 '15 at 09:18

0 Answers0