1

Experimenting with Databricks Spark cluster. When creating a table in a Hive database, I get the following error the first time.

19/06/18 21:34:17 ERROR SparkExecuteStatementOperation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/joda/time/ReadWritableInstant
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:296)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2$$anonfun$run$2.apply$mcV$sp(SparkExecuteStatementOperation.scala:182)
    at org.apache.spark.sql.hive.thriftserver.server.SparkSQLUtils$class.withLocalProperties(SparkSQLOperationManager.scala:190)

On subsequent attempts to create the same table (without restarting the cluster), I get this...

org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyPrimitiveObjectInspectorFactory
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:296)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2$$anonfun$run$2.apply$mcV$sp(SparkExecuteStatementOperation.scala:182)
    at org.apache.spark.sql.hive.thriftserver.server.SparkSQLUtils$class.withLocalProperties(SparkSQLOperationManager.scala:190)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:44)

From beeline (client), I get the following errors.... essentially the same thing.

13: jdbc:spark://dbc-e1ececb9-10d2.cloud.data> create table test_dnax_db.sample2 (name2 string);
Error: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: java.lang.NoClassDefFoundError: org/joda/time/ReadWritableInstant, Query: create table test_dnax_db.sample2 (name2 string). (state=HY000,code=500051)
13: jdbc:spark://dbc-e1ececb9-10d2.cloud.data> create table test_dnax_db.sample2 (name2 string);
Error: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyPrimitiveObjectInspectorFactory, Query: create table test_dnax_db.sample2 (name2 string). (state=HY000,code=500051)

I've tried uploading dependent joda-time jars and serde jars using databricks's libraries feature. Also, I've set the spark property spark.driver.extraClassPath (given error comes from Spark driver and not the workers). Neither helps. I do see the dependent jars available on the hosts /databricks/hive and /databricks/jars folder.

I've also tried to set environment variables like HADOOP_CLASSPATH, without much luck.

Databricks forums are notoriously useless as they are not curated at all (compared to splunk or similar commercial products).

Any suggestions welcome.

I can successfully create database using location keyword as well query from an existing table in the metastore.

EDIT:

I suspect the SparkExecuteStatementOperation (thrift entry class to sql execution in spark cluster, running on the driver) might be using a different classloader different from the application. I added this in my application class static block, which I know gets initialized and I see no ClassNotFoundException i.e. jar is available to application. But underlying driver does not see the relevant jar.

static {
        try {
            Class<?> aClass = Class.forName("org.joda.time.ReadWritableInstant");
            }
        } catch (ClassNotFoundException e) {
            LOG.warn("Unable to find ReadWritableInstant class", e);
        }
}
dtolnay
  • 9,621
  • 5
  • 41
  • 62
Anirban Kundu
  • 131
  • 1
  • 6

0 Answers0