Recently I learned hbase coprocessor, I used endpoint to accumulate one column of hbase table.For example, the hbase table named "pendings",its family is "asset", I accumulate all the value of "asset:amount". The table has other columns,such as "asset:customer_name". The first thing I want to do is accumulate the the value of "asset:amount" group by "asset:customer_name". But I found there is not API for groupby, or I did not find it. Do you know how to implement GROUPBY or how to use the API that HBASE provides?
2 Answers
You should use an endpoint to do this work.
You have a sum example in this article: https://blogs.apache.org/hbase/entry/coprocessor_introduction.
What you basically need to add is to append your row key and the customer name to form your new key "MyKey". You should keep a variable of the last seen MyKey and when the current MyKey is different from the previous one, you should emit the previous one along with its sum and overwrite the previous MyKey to the current one.
You have to make sure to perform the aggregation on the client side as it is done in the example provided in the URL because you may have a customer at the edges of two different regions.

- 730
- 8
- 20
Using endpoint coprocessor can make it. All you should do is that : first define related interface(reduce) protocol extends CoprocessorPotocol, then make an implementation of it, lastly code the client-side logic.

- 11
- 2