2

I'm trying to load table from MS Access database.

I'm doing it like this:

val table = sparkSession.read
    .format("jdbc")
    .option("url", "jdbc:ucanaccess://D:/User/test.mdf;memory=false")
    .option("dbtable", "my_table")
    .load()
    .toDF

I added these dependencies

  • ucanaccess-4.0.1,
  • hsqldb-2.4.1,
  • jackcess-2.1.6,
  • commons-lang3-3.8.1
  • commons-logging-1.2.

I get this exception:

Caused by: net.ucanaccess.jdbc.UcanaccessSQLException: UCAExc:::4.0.1 incompatible data type in conversion: from SQL type CHARACTER to java.lang.Integer, value: Maj_ID
at net.ucanaccess.jdbc.UcanaccessResultSet.getInt(UcanaccessResultSet.java:447)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$6.apply(JdbcUtils.scala:411)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$6.apply(JdbcUtils.scala:410)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:347)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:329)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
  • 1
    Is there a reason you don't use ucanaccess-4.0.4? Looking at the change log there is at least one issue related to CHAR fixed since 4.0.1 - I'm not suggesting that is your fix, but it should be simple to check if it's the case. – Simon Groenewolt Oct 15 '18 at 10:26
  • I agree with @SimonGroenewolt - Download [the latest version of UCanAccess](https://sourceforge.net/projects/ucanaccess/files/latest/download), run `console.bat` or `console.sh`, load the database file, and see if UCanAccess reports any errors. If not, then try `SELECT * FROM my_table;` and see if that works. – Gord Thompson Oct 15 '18 at 12:02

1 Answers1

1

I had the same issue and found the answer here.

This is due to net.ucanaccess.jdbc.UcanaccessDriver not having a dedicated JDBC dialect in Apache Spark, so it falls back to the default dialect which has double quotes for column escaping, then those are treated as string literals. Apparently, the columns should be escaped with backtick character.

In recent versions of Spark (should definitely work in Spark 2.4 and 3.x), you simply register the dialect like this to fix it:

import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects}

object UcanaccessDialect extends JdbcDialect {
  override def canHandle(url: String): Boolean =
    url.toLowerCase(java.util.Locale.ROOT).startsWith("jdbc:ucanaccess")
  override def quoteIdentifier(colName: String): String = s"`$colName`"
}
Cezary Drożak
  • 326
  • 4
  • 11