Cloud Data Fusion Oracle Source Preview Error

Question

I have a question for clarification as well as a 2 errors using cloud data fusion: Background: Creating a pipeline to move data from a single table in Oracle (version 11.2.0.4, local server) into BigQuery using cloud data fusion. I have downloaded Oracle JDBC driver from instant client 11.2.0.4.0 and used the ojdbc6.jar files as the oracle driver for the below deployments. If I use a newer one I get a date error.

Clarity: There appear to be 3 places to load a OJDBC driver:
(a) Wrangler list of databases (ex: sbl), click edit and it will ask me to install an Oracle Driver or Oracle Thin Driver. I am required to enter the Oracle class (which I copy of the examples oracle.jdbc.driver.OracleDriver). If I do not do this one and select the thin client I get an error when I try to view the database objects under Wrangler->Database. If I only do this and not one of the others then I get an message when validating the source in the pipeline that the oracle plugin has not been deployed.

(b) Click HUB and select Drivers I see an option for Oracle 12c JDBC Drver v12c. If I select the Oracle option I get a prompt to download the Oracle 12c JDBC Driver and then deploy that driver by dragging the driver into the window (no option to configure at this point).

(c) Click HUB and Plugins and I see options for Oracle Export Action Plugin v 1.7.0 or Oracle Plugins v 1.2.0. If I click the Oracle Plugins option I get an option to "Deploy".

For clarify, which of these options or combinations do I need to use to deploy the appropriate driver and plugins to access an oracle database as a source?

Error 1: If I use the (c) option from above I will see an Oracle option in the source. I select that and enter my driver name (same name as I gave above (ex: ojdbc6), host, connection credentials and all relevant information. When I select the Validate button I get the following error: Plugin named 'Oracle' of type 'batchsource' not found. Make sure the plugin 'Oracle' of type 'batchsource' is already deployed. Not sure of the options above to use to deploy and I've tried different combinations with no success. Would prefer to use the Oracle source as an option.

Error 2: I select "Database" as the source (not Oracle). I enter in the plugin name (which matches the one I used in option (a) from above. I enter in my connection string: jdbc:oracle:thin@1.1.20.1:1521:sbl and all of the connection information. I validate and it passes and populates the output schema (with the correct columns and data types). I select BigQuery as my Sink and then connect the arrows between the source and the sink. The sink is updated to include the source and target columns appropriately. I validate and it passes. I then "Preview" and "Run". Roughly 31 seconds in I get the following warning and error message:

Warning:  In spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via spark_local_dirs in mesos/standalone and local_dirs in yarn)

org.apache.spark.SparkConf#66-spark-submitter-phase-1-e4706b9a-3c7e-11ea-bb5c-36e9df22dd3d

Error:  
org.apache.spark.executor.Executor#91-Executor task launch worker for task 0    E

java.lang.NullPointerException: null
    at org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:281) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
    at io.cdap.plugin.db.batch.source.DataDrivenETLDBInputFormat.createDBRecordReader(DataDrivenETLDBInputFormat.java:124) ~[1579632628793-0/:na]
    at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createRecordReader(DBInputFormat.java:245) ~[hadoop-mapreduce-client-core-2.9.2.jar:na]
    at io.cdap.cdap.etl.batch.preview.LimitingInputFormat.createRecordReader(LimitingInputFormat.java:51) ~[cdap-etl-core-6.1.0.jar:na]
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:187) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:186) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at io.cdap.cdap.app.runtime.spark.data.DatasetRDD.compute(DatasetRDD.scala:58) ~[io.cdap.cdap.cdap-spark-core2_2.11-6.1.0.jar:na]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.scheduler.Task.run(Task.scala:109) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) ~[spark-core_2.11-2.3.3.jar:2.3.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_232]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]

We're having exact same issues as you. Any luck with this yet? — Korean_Of_the_Mountain, Feb 23 '20 at 22:48
Also, are you able to view the table of interest in Wrangler? As opposed just being able to import schema in Studio. — Korean_Of_the_Mountain, Feb 23 '20 at 22:52
Additionally seeing this issue as well. If you were able to resolve this issue advice would be helpful. — FridayPush, Mar 19 '20 at 05:46

score 0 · Answer 1 · answered Apr 07 '20 at 02:37

Sorry for late response. This is a known issue in current Cloud Data Fusion version (https://issues.cask.co/browse/CDAP-16453). Preview with DB plugin will not work in current version. So please try it on deployed pipeline.

This issue is fixed in upcoming version releasing soon.

Cloud Data Fusion Oracle Source Preview Error

1 Answers1