Does anybody have experience or succeeded on loading data from Bigtable via Pig on Dataproc using HBaseStorage?
Here's a very simple Pig script I'm trying to run. It fails with an error indicating it can't find the BigtableConnection class and I'm wondering what setup I may be missing to successfully load data from Bigtable.
raw = LOAD 'hbase://my_hbase_table'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'cf:*', '-minTimestamp 1490104800000 -maxTimestamp 1490105100000 -loadKey true -limit 5')
AS (key:chararray, data);
DUMP raw;
Steps I followed to setup my cluster:
- Launched Bigtable cluster (my_bt); created and populated my_hbase_table
- Launched Dataproc cluster (my_dp) via cloud.google.com Cloud Dataproc Console
- Installed HBase shell on Dataproc master (/opt/hbase-1.2.1) following instructions on https://cloud.google.com/bigtable/docs/installing-hbase-shell
- Added properties to
hbase-site.xml
for my_bt and BigtableConnection class - Created file
t.pig
with contents listed above - Invoked Pig via command:
gcloud beta dataproc jobs submit pig --cluster my_dp --file t.pig --jars /opt/hbase-1.2.1/lib/bigtable/bigtable-hbase-1.2-0.9.5.1.jar
- Got the following error indicating BigtableConnection class not found:
2017-03-21 15:30:48,029 [JobControl] ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormat - java.io.IOException: java.lang.ClassNotFoundException: com.google.cloud.bigtable.hbase1_2.BigtableConnection