0

I have installed hadoop 2.8.1 on ubuntu and then installed spark-2.2.0-bin-hadoop2.7 on it. Firstly when I created database through spark-shell and tryied to access it through java JDBC program I was seing no table exists. Then I used beeline and observed that the databases do not exists here too. I created databases through beeline. Why does spark-shell and beeline show different DBs?

They probably should show the same. I tried the plain JDBC program which connects hive2 and fetched the tables and observed that sometimes it show tables I creatd through spark-shell and sometimes shows that of created through beeline...Please help. The same is happening sometimes with beeline too..

ABC
  • 354
  • 2
  • 3
  • 13

1 Answers1

2

This is probably because your Spark installation is not configured properly to access your Hive warehouse.

In such case, Spark-SQL is designed to setup its own local warehouse and revert to standalone mode. This is intended to ease adoption for non-Hive users.

To troubleshoot this, you should:

  1. Refer to the official documentation.
  2. read the logs and look for anything related to 'hive' or 'metastore' to understand what happens
  3. make sure that Spark has access to the hive-site.xml configuration file. You can for instance set up a symlink (be sure to check the paths first)

    ln -s /etc/hive/conf/hive-site.xml    /etc/spark/conf/hive-site.xml
    
  4. make sure that your Spark installation has access to the Hive jars (check $SPARK_HOME/lib)
  5. make sure you have enabled Hive support with something like this:

    SparkSession.builder.enableHiveSupport().getOrCreate()
    

Hope this helps.

FurryMachine
  • 1,543
  • 14
  • 12
  • I am new to spark so sorry for silly questions. I didn't install hive and I do not find hive-site.xml in spark folder. I also don't have any hive folder in my file system. Can you please help? – ABC Aug 28 '17 at 06:25
  • Can you please also help me here on [link] (https://stackoverflow.com/questions/45819568/why-there-are-many-spark-warehouse-folders-got-created) – ABC Aug 28 '17 at 07:06
  • I'm not sure I understand your setup, then. If you didn't install hive, how can you use beeline? To use beeline, you are supposed to connect it to a HiveServer2 url, which one is it? If you only want to try spark-sql in a sandbox environment, you can try this docker: https://github.com/FurcyPin/docker-hive-spark It sets up a Hive Metastore, a Spark ThriftServer (HiveServer2) and opens a spark-sql shell connected to it. You can also look at the DockerFile to get starting installing a similar environment. – FurryMachine Aug 28 '17 at 09:03
  • 'jdbc:hive2://localhost:10000/default' is my url. I am still not finding hive-site.xml anywhere..Any guess? – ABC Aug 28 '17 at 10:26
  • hive-site.xml is a configuration file that allows a Hive client like spark to connect to a HiveServer2. You have to create and edit it yourself. Did you set up a HiveServer2 or a spark ThriftServer? https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 https://hortonworks.com/tutorial/spark-sql-thrift-server-example/ If you want to get started with Hive and Spark, I would recommend starting with a Hortonworks or Cloudera sandbox to have a preconfigured environment. If you are really new to Hive and Spark, don't use beeline, stick for spark-sql. – FurryMachine Aug 28 '17 at 10:46
  • @Furry `beeline` is in `$SPARK_HOME/bin` – OneCricketeer Aug 28 '17 at 10:49
  • @FurryMachine No I did not set up hiveserver2. I guess thriftServer is set up automatically while installing spark. I will take a pre configured environment directly. Btw, do u have any hive-site.xml with u for demo? – ABC Aug 28 '17 at 11:27
  • To use beeline, you have to start the Spark Thriftserver like explained here: https://hortonworks.com/tutorial/spark-sql-thrift-server-example/ Otherwise, just start spark-sql. You can find an example of hive-site.xml here, but you don't need it if you only use spark and didn't install hive. https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/data/conf/hive-site.xml – FurryMachine Aug 28 '17 at 11:32
  • Thanks. My goal is to just read the data from Apache spark through java and use it for my purpose by storing it somewhere or use spark sql on this data. – ABC Aug 28 '17 at 11:45