I want to scan / query rows from hbase based on size. I tried using scan.setMaxResultSize(100000) But I didn't get the expected result. Is there any way to achieve this?
Asked
Active
Viewed 425 times
0
-
What do you mean by "based on size", you want to limit a number of rows returned by HBase? – MaxNevermind Nov 07 '16 at 15:51
-
@MaxNevermind, based on the data size, let's say I have 50K entries with the data size as 400mb, I need to read the data by 1mb of data entries per read. I agree it needs 400 reads. As per the hbase document it says setMaxResultSize(bytes) will do this, but it is not working. How to limit the read by data size ? – Harry Nov 08 '16 at 03:16
-
As I understand you need read all of rows from table and divide them into portions of some size? if yes, then you can omit setMaxResultSize() add setBatch() and read all of the rows counting read rows manually. – MaxNevermind Nov 08 '16 at 07:29
-
No, I need to use the setMaxResultSize(), I dont want to load the entries in memory, Rather I would like to load only specified number of bytes in memory. @MaxNevermind – Harry Nov 08 '16 at 14:05
-
MaxResultSize - limits the number of rows this scaner can return, Batch - limits number of rows each call to next() returns. Using batch you can limit how many rows are loaded in memory. – MaxNevermind Nov 08 '16 at 15:34
-
You don't need MaxResultSize, just set batch -> read and count rows, when it reaches a threshold -> write it to file system. If you need want to make another file -> just reset row counter and keep doing the same thing. – MaxNevermind Nov 08 '16 at 15:41