3

Can anyone provide me with some examples to read a DataFrame and Dataset(in Spark 2.0) from phoenix (complete table and also using a query) and write a DataFrame and Dataset(in Spark 2.0) to phoenix, in Apache Spark in java. There aren't any documented examples present for these in java.

Also provide multiple ways if possible like to read from phoenix one way is that we can use PhoenixConfigurationUtil to set a input class and input query and then read newAPIHadoopRDD from sparkContext and another way is to use sqlContext.read().foramt("jdbc").options(pass a map with configuration keys like driver,url,dbtable).load() and there is one more way to read using sqlContext.read().format("org.apache.phoenix.spark").option(pass a map with configuration keys like url,table).load().

While searching I found these ways in other questions for Spark 1.6 with dataFrames but the examples weren't complete, these methods were present only in bits and pieces, so I was not able to make out the complete steps. I couldn't find any example for Spark 2.0

DanielBarbarian
  • 5,093
  • 12
  • 35
  • 44
Kiba
  • 399
  • 1
  • 4
  • 16

3 Answers3

2

This is the example for how to read/write from phoenix

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;

import com.google.common.collect.ImmutableMap;

import java.io.Serializable;

public class SparkConnection implements Serializable {

    public static void main(String args[]) {
        SparkConf sparkConf = new SparkConf();
        sparkConf.setAppName("spark-phoenix-df");
        sparkConf.setMaster("local[*]");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
        DataFrame fromPhx = sqlContext.read().format("jdbc")
                .options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver", "url",
                        "jdbc:phoenix:ZK_QUORUM:2181:/hbase-secure", "dbtable", "TABLE1"))
                .load();
        fromPhx.write().format("org.apache.phoenix.spark").mode(SaveMode.Overwrite)
        .options(ImmutableMap.of("driver", "org.apache.phoenix.jdbc.PhoenixDriver","zkUrl",
                "jdbc:phoenix:localhost:2181","table","RESULT"))
        .save();
    }
}
ROOT
  • 1,757
  • 4
  • 34
  • 60
0

In scala, this can be done as below:

import org.apache.phoenix.spark._
val sqlContext = spark.sqlContext
val df1 =    sqlContext.read.format("jdbc").options(Map("driver" -> "org.apache.phoenix.jdbc.PhoenixDriver","url" -> "jdbc:phoenix:zk4-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net,zk5-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net,zk1-habsem.lzmf1fzmprtezol2fr25obrdth.jx.internal.cloudapp.net:2181:/hbase-unsecure", "dbtable" -> "table_name")).load();
sachingupta
  • 709
  • 2
  • 9
  • 30
-1

This page https://github.com/apache/phoenix/tree/master/phoenix-spark contains, how to load Phoenix tables as RDD or DataFrame and other examples.

For example to load an table as DataFrame:

import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._

val sc = new SparkContext("local", "phoenix-test")
val sqlContext = new SQLContext(sc)

val df = sqlContext.load(
  "org.apache.phoenix.spark", 
  Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")
)

df
  .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
  .select(df("ID"))
  .show

The below gist url is the complete example using Java.

https://gist.github.com/mravi/444afe7f49821819c987

Shankar
  • 8,529
  • 26
  • 90
  • 159
  • Can you provide examples of other ways also for reading from phoenix as i mentioned in my question using sqlcontext and there are other ways to write also like using saveToPhoenix method in DataFrameFunctions class and also using dataFrame.write(). It'll be really helpful i just want to know 2 or 3 ways in which i can read and write. I have only seen bits and pieces of other ways but not a complete example. – Kiba Oct 30 '16 at 16:17
  • @snsancar And also examples of Spark 2.0 Datasets – Kiba Oct 30 '16 at 16:24
  • @snsancar thanks for the quick reply but the example present in the link that you gave is wih RDDs i wanted the read and write operations with DataFrames and Datasets and the ones that you gave in the answer are of no use i have already seen those in the documentation, they are in scala. I wanted in Java. – Kiba Oct 30 '16 at 16:47
  • @Kiba: I gave you the java example also https://gist.github.com/mravi/444afe7f49821819c987 – Shankar Oct 30 '16 at 17:10
  • @snsancar yes but that's with RDDs i wanted with DataFrames and Datasets. i have seen that example and it's not with dataframes and datasets that's why i posted the question to get an example for dataframes and datasets. I have seen other people using sqlcontext to read dataframes and use dataframe.write() to read and write to phoenix, in their questions but they haven't written complete steps. – Kiba Oct 30 '16 at 17:31