Inefficient HBase record reader

Asked Mar 05 '17 at 10:45

Active Mar 06 '17 at 10:29

Viewed 57 times

I made some profiling for my MR job and found that fetching next records for table scan takes ~30% of time spent in mapper. As far as I understand, scanner fetches N rows from server as configured by scan.setCaching and then iterates them locally.

Is there anything I can do to minimize cache load time? Is this a signal that scan was setup incorrectly? Current setup:

scan caching = 100
record size = ~5kb
cf block size = ~130kb, compression=gz

I thought of a custom table record reader that performs pre-fetching in background.

edited Mar 06 '17 at 10:29

asked Mar 05 '17 at 10:45

AdamSkywalker

11,408
3
38
76

its sounds scan caching = 100 is quite reasonable and pls. verify your scan... can you paste your scan statement sample and table structure here ? if you have column value filters it will take some time to match the value. if its row based, it will be faster. with same scan caching size I could able to retry around 5mb data per record. I suspect, the way you scan is the culprit. – Ram Ghadiyaram Mar 08 '17 at 09:53
Maintain records size * caching = 1MB. is 5kb uncompressed size of records ? – KrazyGautam Apr 19 '17 at 16:28

Inefficient HBase record reader

0 Answers0