3

I am very new to Apache Spark and trying to connect to Presto from Apache Spark. Below is my Connection String, which giving the error.

val jdbcDF = spark.read.format("jdbc").options(Map("url" -> "jdbc:presto://host:port/hive?user=username&SSL=true&SSLTrustStorePath=/path/certificatefile", "driver" -> "com.facebook.presto.jdbc.PrestoDriver", "dbtable" -> "tablename", "fetchSize" ->  "10000", "partitionColumn" -> "columnname", "lowerBound" -> "1988", "upperBound" -> "2016", "numPartitions" -> "28")).load()

I have first started start-master.sh within spark/sbin. I also tried setting jar and driver class path in spark-shell like this:

./spark-shell  --driver-class-path com.facebook.presto.jdbc.PrestoDriver --jars /path/jar/file

Still getting below error:

java.sql.SQLException: Unsupported type JAVA_OBJECT
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)

Could someone please help me with that? Thanks

jkat
  • 49
  • 1
  • 4
  • 1
    Is your table using any complex types like array or map? Such types in Presto JDBC are exposed as JAVA_OBJECT (List or Map) and it looks like Spark does not support that. In general support for complex types in JDBC specifications is not well defined. – kokosing Nov 27 '19 at 14:05
  • Why don't you use Spark SQL? – Ashish Nov 27 '19 at 17:40
  • @Ashish that's what I want to use. Spark SQL. but before that I have to connect to Presto and register temporary view – jkat Nov 27 '19 at 18:02
  • You can create temp view in Spark also. Using both Spark and Presto in a single flow may cause some maintenance overhead. – Ashish Nov 28 '19 at 07:43

0 Answers0