Access hive table with json serde via spark sql

Question

I am new to SPARK world. In what way, a hive table with JSON serde could be read via spark sql. Any example piece of code or document would work.

Try setting `spark = SparkSession.builder.enableHiveSupport().getOrCreate()` while creating spark session if you are using `spark 2.0+` — User12345, May 19 '20 at 16:57

score 0 · Answer 1 · answered May 19 '20 at 08:25

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

object ReadJson {
  val spark = SparkSession // Building Spark object
    .builder()
    .appName("ReadJson")
    .master("local[*]")
    .config("spark.sql.shuffle.partitions","4") //Change to a more reasonable default number of partitions for our data
    .config("spark.app.id","RareJson") // To silence Metrics warning
    .getOrCreate()

  val sc = spark.sparkContext // Get the spark context

  val sqlContext = spark.sqlContext  // Get the spark Sql Context

  val input = "hdfs://user/..../..../..../file.json" //hdfs path to the file or directory

  def main(args: Array[String]): Unit = {

    Logger.getRootLogger.setLevel(Level.ERROR)  // application logs

    try {

      val jsonDf = sqlContext
        .read    
        .json(input) // reading the Json file and getting a DataFrame

      jsonDf.show(truncate = false) // showing some data in the console

      jsonDf.createOrReplaceTempView("my_table") // to work with SQL first we create a temporal view

      sqlContext.sql("""SELECT * FROM my_table""").show() //simple query

      // To have the opportunity to view the web console of Spark: http://localhost:4041/
      println("Type whatever to the console to exit......")
      scala.io.StdIn.readLine()
    } finally {
      sc.stop()
      println("SparkContext stopped")
      spark.stop()
      println("SparkSession stopped")
    }
  }
}

Spark programming guide

http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#overview

Actually I am not reading directly from JSON file but the data is stored in HIVE table in the JSON Serde format and I dont know how to read data from there. — Khushboo Yadav, May 19 '20 at 13:53
You could try that approach, I suppose the data are stored in json format, so you read the files, make changes or aggregations with spark, and save the data overwriting the directory. Hive will be able to read the changes. Other approach is enable HiveSupport and stablish the Hive warehouse dir and read tables directly. What version of Spark your are using? Yoy can follow this link: http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables. — Chema, May 19 '20 at 14:02

Access hive table with json serde via spark sql

1 Answers1