2

I'm using Apache Spark 2.1.1 and I'd like to set it up with an external Hive metastore (for Spark Thrift Server specifically).

I have added hive-site.xml to $SPARK_HOME/conf folder that is as follows:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://home.cu:3306/hive_metastore?createDatabaseIfNotExist=true&amp;useLegacyDatetimeCode=false&amp;serverTimezone=Europe/Berlin&amp;nullNamePatternMatchesAll=true </value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://spark-master.cu:9000/value_iq/hive_warehouse/</value>
    <description>Warehouse Location</description>
  </property>
</configuration>

Whenever I try to run spark-shell or Spark Thrift Server they attempt to create the Hive metastore on MySQL (as there is no metastore yet) and they fails with the following error:

17/07/13 19:57:55 ERROR Datastore: Error thrown executing ALTER TABLE `PARTITIONS` ADD COLUMN `TBL_ID` BIGINT NULL : Table 'hive_metastore.partitions' doesn't exist
java.sql.SQLSyntaxErrorException: Table 'hive_metastore.partitions' doesn't exist
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536)
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115)
        at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983)
        at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1936)
        at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:891)
        at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:795)
        at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:760)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:711)
        at org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:259)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3393)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
        at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
        at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
        at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
        at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
        at org.datanucleus.store.query.Query.execute(Query.java:1654)
        at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420

4 Answers4

1

I have found the problem, it was related with the MySQL driver, I was using mysql-connector-java-6.0.6-bin.jar and I have replaced it with and old one mysql-connector-java-5.1.23-bin.jar and now it works.

0

I don't think that your warehouse dir property is configured properly, it should be a path on HDFS

<configuration>
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://maprdemo:9083</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>

dumitru
  • 2,068
  • 14
  • 23
  • But it is a path in hdfs: ` hive.metastore.warehouse.dir hdfs://spark-master.cu:9000/value_iq/hive_warehouse/ Warehouse Location ` – José Carlos Guevara Turruelles Jul 14 '17 at 09:32
  • be sure that it's accessible, if you have Hue than navigate to it. If it's correct than doing a hadoop fs -ls on that should yield the warehouse dir, but I would live it like this hdfs://value_iq/hive_warehouse/ (remove the address and port) – dumitru Jul 14 '17 at 09:38
  • Yes, it is accesible, but what that has to do with the fact that the metastore creation script (or whatever they use) is failing because is trying to alter table `partitions` but doesn't exist: `ERROR Datastore: Error thrown executing ALTER TABLE PARTITIONS ADD COLUMN TBL_ID BIGINT NULL : Table hive_metastore.partitions doesn't exist java.sql.SQLSyntaxErrorException: Table 'hive_metastore.partitions' doesn't exist ` – José Carlos Guevara Turruelles Jul 14 '17 at 09:41
  • do you have a database create, try using ${dbname}.${tableName} in your statements. Also keep in mind that "partitions" is a keyword in Hive – dumitru Jul 14 '17 at 09:48
  • I'm not executing yet any query, that query is executed by spark or hive while the hive metastore is created. I have set on the MySQL url connection `createDatabaseIfNotExist=true` because there is no database, I'm starting from 0. – José Carlos Guevara Turruelles Jul 14 '17 at 09:51
0

I've tried with different versions of spark to migrate the metadata management from derby to mysql/postgres and I'm successful with spark-2.2.1 onwards. The before versions wouldn't let me migrate to other databases. It's purely version dependency that they have with the spark. My suggestion try upgrading the spark version to 2.2.1 and you should be good to go!!

Nikhil
  • 1
0

For all those that try use hive metastore with mysql driver 8.x and downgrading to 5.x helps:

Another solution is to configure URL with mysql options:

connectionURL: "jdbc:mysql://localhost:3306/hive_db?databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true"

From release notes of mysql connector 8

A new connection property, databaseTerm, sets which of the two terms is used in an application to refer to a database. The property takes one of the two values CATALOG or SCHEMA and uses it to determine which Connection methods can be used to set/get the current database, which arguments can be used within the various DatabaseMetaData methods to filter results, and which fields in the ResultSet returned by DatabaseMetaData methods contain the database identification information. See the entry for databaseTerm in Configuration Properties for details. Also, the connection property nullCatalogMeansCurrent has been renamed to nullDatabaseMeansCurrent. The old name remains an alias for the connection property. Thanks to Harald Aamot for contributing to the patch. (Bug #11891000, Bug #27356869, Bug #89133)

For the interested: the problem we've seen is that hive tries to parse all schemas in mysql and it has no permissions doing so. Limiting it to current database(hive_db specified in URL) solved problem for us

Igor Berman
  • 1,522
  • 10
  • 16