2

I am trying to create a data pipeline from "SQL SERVER (from GCP VM)" To "BigQuery" using CLOUD DATA FUSION; I have done all the below setup configurations,

  1. Created the new instance in Cloud data fusion.
  2. Added this as a service account in IAM & Admin.
  3. Installed the JDBC driver in SQL Server plugin
  4. Create the wrangler and read the data from SQL server using this SQL Server plugin (in this step I can successfully authenticate my SQL server and I can see my SQL table data in it)
  5. I Completed the pipleine config by adding Bigquery as a sink.

And I try run the pipeline and it end up with few errors; I have tried few google search but I didn't get the answer.

I was able to create a data fusion pipeline between "GCS To BigQuery" and it was working fine. but this "SQL server to big query" pipeline showing some Error.

Could anyone please help me on this?

Here is the error details,

2020-01-10 13:00:47,528 - WARN [Thread-95:o.a.h.m.LocalJobRunner@589] - job_local976595976_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491) ~[hadoop-mapreduce-client-common-2.9.2.jar:na] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551) ~[hadoop-mapreduce-client-common-2.9.2.jar:na] java.lang.NullPointerException: null at org.apache.hadoop.mapreduce.lib.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:281) ~[hadoop-mapreduce-client-core-2.9.2.jar:na] at io.cdap.plugin.db.batch.source.DataDrivenETLDBInputFormat.createDBRecordReader(DataDrivenETLDBInputFormat.java:124) ~[1578661227434-0/:na] at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.createRecordReader(DBInputFormat.java:245) ~[hadoop-mapreduce-client-core-2.9.2.jar:na] at io.cdap.cdap.etl.batch.preview.LimitingInputFormat.createRecordReader(LimitingInputFormat.java:51) ~[cdap-etl-core-6.1.0.jar:na] at io.cdap.cdap.internal.app.runtime.batch.dataset.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:92) ~[na:na] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.(MapTask.java:521) ~[hadoop-mapreduce-client-core-2.9.2.jar:na] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) ~[hadoop-mapreduce-client-core-2.9.2.jar:na] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.9.2.jar:na] at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) ~[hadoop-mapreduce-client-common-2.9.2.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_232] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_232] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232] 2020-01-10 13:00:50,841 - ERROR [MapReduceRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@97] - MapReduce Program 'phase-1' failed. java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na] at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232] 2020-01-10 13:00:50,842 - ERROR [MapReduceRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@98] - MapReduce program 'phase-1' failed with error: MapReduce JobId job_local976595976_0001 failed. Please check the system logs for more details. java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na] at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) [na:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232] 2020-01-10 13:00:50,916 - ERROR [WorkflowDriver:i.c.c.d.SmartWorkflow@552] - Pipeline '0f084034-33a9-11ea-95f6-8e2648ebe039' failed. 2020-01-10 13:00:51,225 - ERROR [WorkflowDriver:i.c.c.i.a.r.w.WorkflowProgramController@89] - Workflow service 'workflow.default.0f084034-33a9-11ea-95f6-8e2648ebe039.DataPipelineWorkflow.20288f05-33a9-11ea-a505-8e2648ebe039' failed. java.lang.IllegalStateException: MapReduce JobId job_local976595976_0001 failed at com.google.common.base.Preconditions.checkState(Preconditions.java:176) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService.run(MapReduceRuntimeService.java:416) ~[na:na] at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) ~[com.google.guava.guava-13.0.1.jar:na] at io.cdap.cdap.internal.app.runtime.batch.MapReduceRuntimeService$2$1.run(MapReduceRuntimeService.java:450) ~[na:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]

raj
  • 29
  • 4
  • Experiencing the same error. I've previously got this working, as well as using the SQL Multi-table plugin. Also different that I'm using Postgres and the postgres jdbc driver. Let me know if you find a solution please. – FridayPush Jan 16 '20 at 23:42
  • Still looking for the answer. I believe we need to do some configuration setting bcoz i don't see any issues in pipeline part. Sure, I will let u know if i found any solution. Also please let me know meantime if u find a way to do this..... – raj Jan 19 '20 at 01:29

2 Answers2

0

As per issue records reported, you have persisted with java.lang.nullpointerexception error, that might reflect the usage of a null when the object required within an application run path.

Assuming the fact that you've successfully configured JDBC driver, I would recommend to check the source Database Properties across your pipeline in order to determine the undefined field, supposedly can be Import Query property field, that is used to import data from specified table by supplying SELECT query with appropriate $CONDITIONS if the number of splits to generate is more than 1:

SELECT * FROM <table> WHERE $CONDITIONS
Nick_Kh
  • 5,089
  • 2
  • 10
  • 16
  • Thanks for your response. Actually I have already checked and updated the SQL query; even after that it's showing error. It's very weird.did I miss any other process here?? – raj Jan 14 '20 at 12:06
  • Did you find any solution? I'm facing the same issue. – Balajee Venkatesh Feb 15 '20 at 12:48
  • Hi Balajee Venkatesh, No, Still facing the same issue. If you find any solution please let me know. I will also update this thread if I found the answer. – raj Feb 18 '20 at 09:42
  • Hi everyone, If anyone has a answer for this question please let me know. – raj Mar 10 '20 at 17:32
0

UPDATE: https://issues.cask.co/browse/CDAP-16453 It's a known issue, fixed in 6.1.2

"Same error on MySQL 5.x Strange enough, if you deploy the pipeline and run it it works... I'm thinking about decoupling pipelines to have small sql-to-storage and the big pipeline in the outgoing flow"

regards Virgilio