I've been working on a small task of converting and loading hive data to HFiles in HBase; framework MapR. Using bulkload I'm loading the data after conversion in HFiles. There isn't any issue with conversion, the conversion is going fine. The only issue I'm facing is MR job failure as and when the size of hive data increases. The job fails because of virtual memory getting filled up. The job breaks if the hive data size limit crosses 10Gigs.
All data is moved into single region server instead getting distributed on multiple region servers; it's a 10 node cluster I'm working on. It seems there is hbase hotspotting.
I've tried splitting the regions in multiples(NUMREGIONS => 256) and distributing the load equally (SPLITALGO => 'UniformSplit') among the regions. But it doesn't resolve the issue. Anybody got any idea how to resolve this hotspotting issue??
Regards, Adil