4

I am trying to read a Oracle table using spark.read.format and it works great for all tables except few tables which has any column with datatype as ROWID.

Below is my Code

var df = spark.read.format("jdbc").
        option("url", url).
        option("driver", driver).
        option("dbtable", dbTable).load()
println(df.first)

I am getting the below error

18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor
    at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
    at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
    at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)

Analysis: I have checked JdbcUtils object and found that case java.sql.Types.ROWID is mapped to LongType of spark. But getLong is not implemented for ROWID type in Oracle JDBC driver. I am confident that if i manage to map it to StringType then it will work but I could not found any workaround for this. Please suggest the solution here.

As of now, I am reading the data using ResultSet and later converting it to DataFrame using custom schema but it kills parallelism. Also wanted to check if any option to modify code of JdbcUtils as I am not able to extend DataFrameReader to create a custom one since "private[sql]" is mentioned in class definition.

Arghya Saha
  • 227
  • 1
  • 4
  • 17

1 Answers1

2
val df=spark.read
  .format("jdbc")
  .option("driver","oracle.jdbc.OracleDriver")
  .option("url","jdbc:oracle:thin:@localhost:1521:orcl")
  .option("user","oracle1")
  .option("password","oracle1")
  .option("dbtable","(select Cast(RID as VARCHAR2(18)) from sample.ALL_RESULTS_DATA)  my_table")
  .load()

RID is a column name of data type ROWID in oracle.

The above code will cast the ROWID data type to String so that we can read it using Spark

RESULT:

-------------------------+
|CAST(RIDASVARCHAR2(18))AS|
+-------------------------+
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|
|       AAAIVuAABAAAMhCAAA|