I'm trying to use an instance of a Dataproc cluster to import large CSV files to HDFS, then export them to SequenceFile format, then finally to import the latest to Bigtable as described here: https://cloud.google.com/bigtable/docs/exporting-importing
I initially imported the CSV files as an external table in Hive, then exported them by inserting them in a SequenceFile backed table.
However (probably since it seems dataproc ships with Hive 1.0?), I faced the cast exception error mentioned here: Bigtable import error
I can't seem to get HBase shell or ZooKeeper up and running on the dataproc master VM, so I can't run a simple export job from CLI.
Is there an alternative way I could export bigtable-compatible sequence files from dataproc ?
What's the proper configuration to setup to get HBase and ZooKeeper running from Dataproc VM master node ?