I have a use-case to read from HBase inside a pyspark job and is currently doing a scan on the HBase table like this,
conf = {"hbase.zookeeper.quorum": host, "hbase.cluster.distributed": "true", "hbase.mapreduce.inputtable": "table_name", "hbase.mapreduce.scan.row.start": start, "hbase.mapreduce.scan.row.stop": stop}
rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat", "org.apache.hadoop.hbase.io.ImmutableBytesWritable","org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv,conf=cmdata_conf)
I am unable to find the conf to do a GET on the HBase table. Can someone help me? I could find that filters are not supported with pyspark. But is it not possible to do a simple GET?
Thanks!