0

Working with Spark 2.0.2 I have a jar which works fine with spark-submit. Now I wanna use it from Spark JobServer.

The first problem was that the methods:

public SparkJobValidation validate(SparkContext sc, Config config) {
    return SparkJobValid$.MODULE$; 
}

@Override
public Object runJob(SparkContext jsc, Config jobConfig) {
//code
}

Have the deprecated SparkContext instead SparkSession as parameter. My solution was to do the following:

@Override
public Object runJob(SparkContext jsc, Config jobConfig) {
    SparkSession ss = SparkSession.builder()
            .sparkContext(jsc)
            .enableHiveSupport()
            .getOrCreate();

return ss.table("purchases").showString(20, true);
}

I have no Hive installed, I'm just using HiveSupport who comes with Spark and I put the hive-site.xml under $SPARK_HOME/conf and that's working with spark-submit.

hive-site.xml

    <?xml version="1.0"?>
        <configuration>
         <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
          <description>JDBC connect string for a JDBC metastore</description>
         </property>

         <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
           <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>user</value>
      <description>username to use against metastore database</description>
    </property>

    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>password</value>
      <description>password to use against metastore database</description>
    </property>

    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>/spark-warehouse/</value>
      <description>Warehouse Location</description>
    </property>
    </configuration>

But when I execute this jar as a job from the Spark JobServer from this config file only takes the hive.metastore.warehouse.dir . Doesn't make any connection with MySQL db to read/save the Hive metastore_db, and of course, it can not see the tables on the default db. I have the mysql-connector-java-5.1.40-bin.jar on $SPARK_HOME/jars folder.

What can I do in order to connect to the Hive metastore_db located in my MySQL DB?

1 Answers1

0

Using the Hive Context instead the basic context, as I'm using java I have to use context-factory=spark.jobserver.context.JavaHiveContextFactory on the context creation and I have implemented a class like follows:

public class My_SparkHIVEJob implements JHiveJob<String> {


@Override
    public String run(HiveContext c, JobEnvironment je, Config config) {
        /*
            JOB CODE...
        */
    }

    @Override
    public Config verify(HiveContext c, JobEnvironment je, Config config) {
        return config;
    }

Seems pretty easy but some months ago when I was starting with Spark - Spark Jobserver it wasn't so :-)