I am new to SPARK world. In what way, a hive table with JSON serde could be read via spark sql. Any example piece of code or document would work.
Asked
Active
Viewed 1,689 times
1
-
Try setting `spark = SparkSession.builder.enableHiveSupport().getOrCreate()` while creating spark session if you are using `spark 2.0+` – User12345 May 19 '20 at 16:57
1 Answers
0
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
object ReadJson {
val spark = SparkSession // Building Spark object
.builder()
.appName("ReadJson")
.master("local[*]")
.config("spark.sql.shuffle.partitions","4") //Change to a more reasonable default number of partitions for our data
.config("spark.app.id","RareJson") // To silence Metrics warning
.getOrCreate()
val sc = spark.sparkContext // Get the spark context
val sqlContext = spark.sqlContext // Get the spark Sql Context
val input = "hdfs://user/..../..../..../file.json" //hdfs path to the file or directory
def main(args: Array[String]): Unit = {
Logger.getRootLogger.setLevel(Level.ERROR) // application logs
try {
val jsonDf = sqlContext
.read
.json(input) // reading the Json file and getting a DataFrame
jsonDf.show(truncate = false) // showing some data in the console
jsonDf.createOrReplaceTempView("my_table") // to work with SQL first we create a temporal view
sqlContext.sql("""SELECT * FROM my_table""").show() //simple query
// To have the opportunity to view the web console of Spark: http://localhost:4041/
println("Type whatever to the console to exit......")
scala.io.StdIn.readLine()
} finally {
sc.stop()
println("SparkContext stopped")
spark.stop()
println("SparkSession stopped")
}
}
}
Spark programming guide
http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#overview

Chema
- 2,748
- 2
- 13
- 24
-
Actually I am not reading directly from JSON file but the data is stored in HIVE table in the JSON Serde format and I dont know how to read data from there. – Khushboo Yadav May 19 '20 at 13:53
-
1You could try that approach, I suppose the data are stored in json format, so you read the files, make changes or aggregations with spark, and save the data overwriting the directory. Hive will be able to read the changes. Other approach is enable HiveSupport and stablish the Hive warehouse dir and read tables directly. What version of Spark your are using? Yoy can follow this link: http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables. – Chema May 19 '20 at 14:02