2

I am new to Spark and needed help in figuring out why my Hive databases are not accessible to perform a data load through Spark.

Background:

  1. I am running Hive, Spark, and my Java program on a single machine. It's a Cloudera QuickStart VM, CDH5.4x, on a VirtualBox.

  2. I have downloaded pre-built Spark 1.3.1.

  3. I am using the Hive bundled with the VM and can run hive queries through Spark-shell and Hive cmd line without any issue. This includes running the command:

    LOAD DATA INPATH 'hdfs://quickstart.cloudera:8020/user/cloudera/test_table/result.parquet/' INTO TABLE test_spark.test_table PARTITION(part = '2015-08-21');
    

Problem:

I am writing a Java program to read data from Cassandra and load it into Hive. I have saved the results of the Cassandra read in parquet format in a folder called 'result.parquet'.

Now I would like to load this into Hive. For this, I

  1. Copied the Hive-site.xml to the Spark conf folder.

    • I made a change to this xml. I noticed that I had two hive-site.xml - one which was auto generated and another which had Hive execution parameters. I combined both into a single hive-site.xml.
  2. Code used (Java):

    HiveContext hiveContext = new      
      HiveContext(JavaSparkContext.toSparkContext(sc));
      hiveContext.sql("show databases").show();
      hiveContext.sql("LOAD DATA INPATH       
      'hdfs://quickstart.cloudera:8020/user/cloudera/test_table/result.parquet/'  
      INTO TABLE test_spark.test_table PARTITION(part = '2015-08-21')").show();
    

So, this worked. And I could load data into Hive. Except, after I restarted my VM, it has stopped working.

When I run the show databases Hive query, I get a result saying

result
default

instead of the databases in Hive, which are

default
test_spark

I also notice a folder called metastore_db being created in my Project Folder. From googling around, I know this happens when Spark can't connect to the Hive metastore, so it creates one of its own.I thought I had fixed that, but clearly not.

What am I missing?

  • 1
    Hi Mithila...you must place spark conf folder in class path – wazza Jul 27 '15 at 06:07
  • Thanks! That worked. I, however, forgot to reuild the project after a mvn clean, and hence, this solution seemed to not be working. All good now :) – Mithila Joshi Aug 03 '15 at 21:51

0 Answers0