Can beeline and spark-shell show different databases for same apache spark?

Question

I have installed hadoop 2.8.1 on ubuntu and then installed spark-2.2.0-bin-hadoop2.7 on it. Firstly when I created database through spark-shell and tryied to access it through java JDBC program I was seing no table exists. Then I used beeline and observed that the databases do not exists here too. I created databases through beeline. Why does spark-shell and beeline show different DBs?

They probably should show the same. I tried the plain JDBC program which connects hive2 and fetched the tables and observed that sometimes it show tables I creatd through spark-shell and sometimes shows that of created through beeline...Please help. The same is happening sometimes with beeline too..

score 2 · Accepted Answer · answered Aug 23 '17 at 09:17

2

This is probably because your Spark installation is not configured properly to access your Hive warehouse.

In such case, Spark-SQL is designed to setup its own local warehouse and revert to standalone mode. This is intended to ease adoption for non-Hive users.

To troubleshoot this, you should:

Refer to the official documentation.
read the logs and look for anything related to 'hive' or 'metastore' to understand what happens
make sure that Spark has access to the hive-site.xml configuration file. You can for instance set up a symlink (be sure to check the paths first)
```
ln -s /etc/hive/conf/hive-site.xml    /etc/spark/conf/hive-site.xml
```
make sure that your Spark installation has access to the Hive jars (check $SPARK_HOME/lib)
make sure you have enabled Hive support with something like this:
```
SparkSession.builder.enableHiveSupport().getOrCreate()
```

Hope this helps.

answered Aug 23 '17 at 09:17

FurryMachine

1,543
14
12

I am new to spark so sorry for silly questions. I didn't install hive and I do not find hive-site.xml in spark folder. I also don't have any hive folder in my file system. Can you please help? – ABC Aug 28 '17 at 06:25
Can you please also help me here on [link] (https://stackoverflow.com/questions/45819568/why-there-are-many-spark-warehouse-folders-got-created) – ABC Aug 28 '17 at 07:06
I'm not sure I understand your setup, then. If you didn't install hive, how can you use beeline? To use beeline, you are supposed to connect it to a HiveServer2 url, which one is it? If you only want to try spark-sql in a sandbox environment, you can try this docker: https://github.com/FurcyPin/docker-hive-spark It sets up a Hive Metastore, a Spark ThriftServer (HiveServer2) and opens a spark-sql shell connected to it. You can also look at the DockerFile to get starting installing a similar environment. – FurryMachine Aug 28 '17 at 09:03
'jdbc:hive2://localhost:10000/default' is my url. I am still not finding hive-site.xml anywhere..Any guess? – ABC Aug 28 '17 at 10:26
hive-site.xml is a configuration file that allows a Hive client like spark to connect to a HiveServer2. You have to create and edit it yourself. Did you set up a HiveServer2 or a spark ThriftServer? https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 https://hortonworks.com/tutorial/spark-sql-thrift-server-example/ If you want to get started with Hive and Spark, I would recommend starting with a Hortonworks or Cloudera sandbox to have a preconfigured environment. If you are really new to Hive and Spark, don't use beeline, stick for spark-sql. – FurryMachine Aug 28 '17 at 10:46
@Furry `beeline` is in `$SPARK_HOME/bin` – OneCricketeer Aug 28 '17 at 10:49
@FurryMachine No I did not set up hiveserver2. I guess thriftServer is set up automatically while installing spark. I will take a pre configured environment directly. Btw, do u have any hive-site.xml with u for demo? – ABC Aug 28 '17 at 11:27
To use beeline, you have to start the Spark Thriftserver like explained here: https://hortonworks.com/tutorial/spark-sql-thrift-server-example/ Otherwise, just start spark-sql. You can find an example of hive-site.xml here, but you don't need it if you only use spark and didn't install hive. https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/data/conf/hive-site.xml – FurryMachine Aug 28 '17 at 11:32
Thanks. My goal is to just read the data from Apache spark through java and use it for my purpose by storing it somewhere or use spark sql on this data. – ABC Aug 28 '17 at 11:45

Can beeline and spark-shell show different databases for same apache spark?

1 Answers1

Linked