Save CSV file to hbase table using Spark and Phoenix

Question

Can someone point me to a working example of saving a csv file to Hbase table using Spark 2.2 Options that I tried and failed (Note: all of them work with Spark 1.6 for me)

phoenix-spark
hbase-spark
it.nerdammer.bigdata : spark-hbase-connector_2.10

All of them finally after fixing everything give similar error to this Spark HBase

Thanks

This was the known issue what version of hbase connector are you using? — Rahul Sharma, Sep 28 '17 at 21:33

score 3 · Answer 1 · answered Sep 28 '17 at 21:37

3

Add below parameters to your spark job-

spark-submit \
--conf "spark.yarn.stagingDir=/somelocation" \
--conf "spark.hadoop.mapreduce.output.fileoutputformat.outputdir=/s‌omelocation" \
--conf "spark.hadoop.mapred.output.dir=/somelocation"

answered Sep 28 '17 at 21:37

Rahul Sharma

5,614
10
57
91

I have setup HBase and Phoenix locally and I did what you said. Added these configs in the code itself. Again, I got the same error. Although in both cases, with and without config: Data is loaded successfully and than it gives me the error. – abstractKarshit Sep 28 '17 at 22:31
Based on my analysis, job fails at `HadoopMapReduceCommitProtocol.absPathStagingDir` because outputpath is empty, which is supplied correctly using `mapreduce.output.fileoutputformat.outputdir` parameters. The hadoop configuration is populated to hadoopConf using SparkHaddopUtil and everything looks correct to me. can you please add these params as well to sparkConf object- `spark.hadoop.mapreduce.output.dir` , `spark.hadoop.mapred.output.fileoutputformat.outputdir` – Rahul Sharma Sep 30 '17 at 05:30

score 0 · Answer 2 · answered Oct 26 '17 at 17:48

Phoexin has plugin and jdbc thin client which can connect(read/write) to HBASE, example are in https://phoenix.apache.org/phoenix_spark.html

Option 1 : Connect via zookeeper url - phoenix plugin

            import org.apache.spark.SparkContext
            import org.apache.spark.sql.SQLContext
            import org.apache.phoenix.spark._

            val sc = new SparkContext("local", "phoenix-test")
            val sqlContext = new SQLContext(sc)

            val df = sqlContext.load(
              "org.apache.phoenix.spark",
              Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")
            )

            df
              .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
              .select(df("ID"))
              .show

Option 2 : Use JDBC thin client provied by phoenix query server

more info on https://phoenix.apache.org/server.html

jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF

Save CSV file to hbase table using Spark and Phoenix

2 Answers2

Linked