4

I am using spark 2.1.0 version and trying to establish a connection with Hive tables. My hive data warehouse is in /user/hive/warehouse in hdfs, by listing contents of that folder i can see all the dbname.db folders in it. After some research i found that i need to specify the spark.sql.warehouse.dir in spark 2.x and i set it like this

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
      .enableHiveSupport()
      .getOrCreate() 

and now i am trying to print the databases

spark.sql("show databases").show()

but i am only seeing default databases,

+------------+
|databaseName|
+------------+
|     default|
+------------+

So i there any way i can connect the spark to the existing hive database? is there anything i am missing here?

Justin
  • 735
  • 1
  • 15
  • 32

3 Answers3

5

Your hive-site.xml should be in classpath. Check this post. If you are using maven project then you can keep this file in resources folder.

Another way to connect to hive is using metastore uri.

val spark = SparkSession
.builder()
.appName("Spark Hive Example")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
abaghel
  • 14,783
  • 2
  • 50
  • 66
  • Can you elaborate please, why is yours working, whereas OP's `config("spark.sql.warehouse.dir", "/user/hive/warehouse")` does not? What OP suggests is also suggested in the official documentation. Apparently, what is suggested officially does not work for many people... – Sergey Bushmanov Sep 19 '17 at 18:09
  • Where in Spark official documentation, do you see this is possible to configure Hive like this? – Thomas Decaux Aug 24 '21 at 07:31
0

there is a hive-site.xml file in /usr/lib/hive/conf. copy this file to

/usr/lib/spark/conf then you will see other databases. please follow the below steps.

1.open hive console and create a new database hive>create database venkat;

2.close hive terminal

3.copy hive -site.xml file

sudo cp /usr/lib/hive/conf/hive-site.xml        /usr/lib/spark/conf/hive-site.xml

4.check databases

sqlContext.sql("show databases").show();

I think it will helpful

coder
  • 8,346
  • 16
  • 39
  • 53
mike
  • 1
  • 1
0

step one: You should config like this under Custom spark2-defaults: enter image description here

step two: Write the following command from the command line:

import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show()

enter image description here

Integrating Apache Hive with Spark and BI: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_spark_hive_connection.html

HiveWarehouseSession API operations: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehousesession_api_operations.html

QiuYi
  • 125
  • 1
  • 4