2

I have a Java Spark code where I'm trying to connect Hive database. But its has only default database and gives me NoSuchDatabaseException. I tried the following to set the hive metastore.

  1. Add Spark conf in code with Hive Metastore URI
  2. Add Spark conf in spark submit
  3. Add the hive-site.xml in resources folder
  4. copy the hive-site.xml in spark conf (/etc/spark2/conf/hive-site.xml)

Also, the hive config file loaded at run time is same as (/etc/hive/conf/hive-site.xml)

SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("example");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
final SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark Hive Example")
                .config("hive.metastore.uris", "thrift://***:1234")
                .config("spark.sql.uris", "thrift://***:1234")
                .config("hive.metastore.warehouse.dir", "hdfs://***:1234/user/hive/warehouse/")
                .enableHiveSupport()
                .getOrCreate();
JavaRDD<sampleClass> rdd = sc.parallelize(sample);

Dataset<Row> df2 = spark.createDataFrame(rdd, sampleClass.class);

spark.sql("show databases").show();

The Logs of the spark submit is as below.

    spark-submit --class sampleClass \
> --master local --deploy-mode client --executor-memory 1g \
> --name sparkTest --conf "spark.app.id=SampleLoad" \
> --files /etc/spark/conf/hive-site.xml load-1.0-SNAPSHOT-all.jar
20/03/16 12:33:19 INFO SparkContext: Running Spark version 2.3.0.2.6.5.0-292
20/03/16 12:33:19 INFO SparkContext: Submitted application: SampleLoad
20/03/16 12:33:19 INFO SecurityManager: Changing view acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls to: root,User
20/03/16 12:33:19 INFO SecurityManager: Changing view acls groups to:
20/03/16 12:33:19 INFO SecurityManager: Changing modify acls groups to:
20/03/16 12:33:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root, User); groups with view
permissions: Set(); users  with modify permissions: Set(root, User); groups with modify permissions: Set()
20/03/16 12:33:19 INFO Utils: Successfully started service 'sparkDriver' on port 35746.
20/03/16 12:33:19 INFO SparkEnv: Registering MapOutputTracker
20/03/16 12:33:19 INFO SparkEnv: Registering BlockManagerMaster
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/16 12:33:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b946b14f-a52d-4467-8028-503ed7ae93da
20/03/16 12:33:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/03/16 12:33:19 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/16 12:33:19 INFO Utils: Successfully started service 'SparkUI' on port 4042.
20/03/16 12:33:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sample:4042
20/03/16 12:33:19 INFO SparkContext: Added JAR file:/abc/xyz/load-1.0-SNAPSHOT-all.jar at spark://sample:35746/jars/load-1.0-SNAPSHOT-all.jar with timestamp 1584347599756
20/03/16 12:33:19 INFO SparkContext: Added file file:///etc/spark/conf/hive-site.xml at file:///etc/spark/conf/hive-site.xml with timestamp 1584347599776
20/03/16 12:33:19 INFO Utils: Copying /etc/spark/conf/hive-site.xml to /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae/userFiles-aaca5153-ce38-489a-a020-c2477fddc66e/hi
ve-site.xml
20/03/16 12:33:19 INFO Executor: Starting executor ID driver on host localhost
20/03/16 12:33:19 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45179.
20/03/16 12:33:19 INFO NettyBlockTransferService: Server created on sample:45179
20/03/16 12:33:19 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/16 12:33:19 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMasterEndpoint: Registering block manager sample:45179 with 366.3 MB RAM, BlockManagerId(driver, lhdpegde2u.enbduat.c
om, 45179, None)
20/03/16 12:33:19 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:19 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, sample, 45179, None)
20/03/16 12:33:20 INFO EventLoggingListener: Logging events to hdfs:/spark2-history/local-1584347599812
20/03/16 12:33:20 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
20/03/16 12:33:20 INFO SharedState: loading hive config file: file:/etc/spark2/2.6.5.0-292/0/hive-site.xml
20/03/16 12:33:21 INFO SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/apps/hive/warehouse').
20/03/16 12:33:21 INFO SharedState: Warehouse path is '/apps/hive/warehouse'.
20/03/16 12:33:21 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/03/16 12:33:22 INFO CodeGenerator: Code generated in 184.728545 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 10.538159 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 8.809847 ms
+-------+----------------+--------------------+
|   name|     description|         locationUri|
+-------+----------------+--------------------+
|default|default database|/apps/hive/warehouse|
+-------+----------------+--------------------+

20/03/16 12:33:23 INFO CodeGenerator: Code generated in 7.13541 ms
20/03/16 12:33:23 INFO CodeGenerator: Code generated in 5.771691 ms
+------------+
|databaseName|
+------------+
|     default|
+------------+

Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'sample' not found;
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:177)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:259)
        at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
        at ProcessXML.main(ProcessXML.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/03/16 12:33:23 INFO SparkContext: Invoking stop() from shutdown hook
20/03/16 12:33:23 INFO SparkUI: Stopped Spark web UI at http://sample:4042
20/03/16 12:33:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/16 12:33:24 INFO MemoryStore: MemoryStore cleared
20/03/16 12:33:24 INFO BlockManager: BlockManager stopped
20/03/16 12:33:24 INFO BlockManagerMaster: BlockManagerMaster stopped
20/03/16 12:33:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/16 12:33:24 INFO SparkContext: Successfully stopped SparkContext
20/03/16 12:33:24 INFO ShutdownHookManager: Shutdown hook called
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-37386c3b-855a-4e09-a372-e8d12a08eebc
20/03/16 12:33:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-914265c5-6115-4aca-8b85-2cd49a530fae

Kindly let me know what/where i went wrong.

Thanks in Advance,

Gowtham R

Gowtham
  • 87
  • 1
  • 14
  • 2
    `hive.metastore.uris` is not a Spark property, it's a Hadoop property (used by the Hive Metastore lib). Try `spark.hadoop.hive.metastore.uris` instead, the prefix means that it should be pushed dynamically to the Hadoop conf. – Samson Scharfrichter Mar 16 '20 at 10:14
  • 2
    Also, your logs state that `hive.metastore.warehouse.dir` is deprecated, and has been replaced by a property starting with `spark.sql` (as it should have from the beginning) – Samson Scharfrichter Mar 16 '20 at 10:15
  • 2
    Also, **why** do you feed a list of static Hive properties via `hive-site.xml` and **also** want to override these properties by injecting dynamically `hive.metastore.uris` ???? – Samson Scharfrichter Mar 16 '20 at 10:18
  • My target is to access the hive databases, since it is not working I am trying to set multiple properties. Setting spark.hadoop.hive.metastore.uris also doesnt work. – Gowtham Mar 16 '20 at 10:28
  • My hive-site.xml is the default one available for hive in /etc/hive/conf/hive-site.xml. Also, when i use pyspark, its working. Only when running it as a java program, its not connecting to hive. – Gowtham Mar 16 '20 at 11:09
  • 1
    _"when i use pyspark it's working"_ > you mean the PySpark shell, or a `spark-submit` with a Python script? Does it work with Spark-Shell and Scala i.e. `spark.sql("show databases").show` ? – Samson Scharfrichter Mar 16 '20 at 19:37
  • 1
    Ahhh... what is this abomination of using a "JavaSparkContext" not attached to the SparkSession??? Might explain the warning about `some configuration may not take effect.` – Samson Scharfrichter Mar 16 '20 at 19:40
  • Using the Context from Spark Session did the trick for me. IT worked. Thanks – Gowtham Mar 17 '20 at 09:13
  • 1
    @Gowtham how you resolved using context is it sqlcontext? can you share sample code here? Appreciate. – Prakash Raj Sep 28 '21 at 13:50

0 Answers0