1

I am trying to test the following library: https://tech.scribd.com/blog/2021/introducing-sql-delta-import.html

I want to copy data from my SQL database to a data lake, in the delta format. I have created a mount point, databases, and an empty delta table. What I am trying to do now, is to run a databricks job with the following parameters:

["--class","io.delta.connectors.spark.JDBC.ImportRunner",
 "/jars/sql_delta_import_2_12_0_2_1_SNAPSHOT.jar",
"jdbc:sqlserver:/myserver.database.windows.net:1433;database=mydatabase;user=myuser;password=mypass;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30",
"sourcedb.sourcetable",
"targetdb.targettable",
"PersonID"]

What I am getting is:

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Warning: Ignoring non-Spark config property: libraryDownload.sleepIntervalSeconds
Warning: Ignoring non-Spark config property: libraryDownload.timeoutSeconds
Warning: Ignoring non-Spark config property: eventLog.rolloverIntervalSeconds
Error: Failed to load class io.delta.connectors.spark.JDBC.ImportRunner.

Fetching the jar file was logged correctly, so it was able to find it.

21/05/07 10:08:15 INFO Utils: Fetching dbfs:/jars/sql_delta_import_2_12_0_2_1_SNAPSHOT.jar to /local_disk0/tmp/spark-76e146dd-835d-4ddf-9b3b-f32d75c3cba2/fetchFileTemp8075862365042488320.tmp

I have unpacked the jar file and the path exists, so I am not sure what the cause could be. That's my first encounter with Scala, so I would appreciate any advice since I am a bit lost.

stdout output:

2021-05-07T13:46:40.434+0000: [GC (Allocation Failure) [PSYoungGen: 56320K->8326K(65536K)] 56320K->8342K(216064K), 0.0083761 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
2021-05-07T13:46:40.884+0000: [GC (Allocation Failure) [PSYoungGen: 64646K->7553K(65536K)] 64662K->7577K(216064K), 0.0076350 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2021-05-07T13:46:41.367+0000: [GC (Allocation Failure) [PSYoungGen: 63873K->8972K(65536K)] 63897K->9004K(216064K), 0.0069414 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
2021-05-07T13:46:41.422+0000: [GC (Metadata GC Threshold) [PSYoungGen: 25702K->6172K(121856K)] 25734K->6212K(272384K), 0.0058830 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
2021-05-07T13:46:41.428+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 6172K->0K(121856K)] [ParOldGen: 40K->6082K(78336K)] 6212K->6082K(200192K), [Metaspace: 20079K->20079K(1067008K)], 0.0277412 secs] [Times: user=0.06 sys=0.01, real=0.03 secs] 
2021-05-07T13:46:42.235+0000: [GC (Allocation Failure) [PSYoungGen: 112640K->7697K(121856K)] 118722K->13851K(200192K), 0.0088850 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
2021-05-07T13:46:42.745+0000: [GC (Allocation Failure) [PSYoungGen: 120337K->5906K(195584K)] 126491K->12068K(273920K), 0.0112406 secs] [Times: user=0.02 sys=0.01, real=0.01 secs] 
2021-05-07T13:46:42.881+0000: [GC (Metadata GC Threshold) [PSYoungGen: 28380K->4154K(197632K)] 34542K->10324K(275968K), 0.0055152 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
2021-05-07T13:46:42.886+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 4154K->0K(197632K)] [ParOldGen: 6170K->9617K(121344K)] 10324K->9617K(318976K), [Metaspace: 33488K->33488K(1079296K)], 0.0902208 secs] [Times: user=0.32 sys=0.01, real=0.09 secs] 
2021-05-07T13:46:44.026+0000: [GC (Allocation Failure) [PSYoungGen: 187904K->8219K(273920K)] 197521K->17845K(395264K), 0.0091905 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
Heap
 PSYoungGen      total 273920K, used 195530K [0x000000073d980000, 0x0000000752280000, 0x00000007c0000000)
  eden space 265216K, 70% used [0x000000073d980000,0x000000074906b950,0x000000074dc80000)
  from space 8704K, 94% used [0x0000000751a00000,0x0000000752206f00,0x0000000752280000)
  to   space 10240K, 0% used [0x0000000750e80000,0x0000000750e80000,0x0000000751880000)
 ParOldGen       total 121344K, used 9625K [0x0000000638c00000, 0x0000000640280000, 0x000000073d980000)
  object space 121344K, 7% used [0x0000000638c00000,0x0000000639566708,0x0000000640280000)
 Metaspace       used 50422K, capacity 52976K, committed 53248K, reserved 1095680K
  class space    used 6607K, capacity 6877K, committed 6912K, reserved 1048576K
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Grevioos
  • 355
  • 5
  • 30
  • can you post complete stack trace? Especially looking for lines like "Caused by" – Alex Ott May 07 '21 at 13:28
  • @AlexOtt Thank you for your answer, I believe it's a bit too much to be copy-pasted, so I have uploaded it here https://www.dropbox.com/sh/782u3vmsg9e6i53/AAB9lvvFAUOlYJU7-BuGwM0ba?dl=0 – Grevioos May 07 '21 at 14:21
  • @Grevioos Might it be the version of runtime you are using ? – Arthur Clerc-Gherardi May 07 '21 at 15:06
  • @ArthurClerc-Gherardi I have tried a few different ones already. – Grevioos May 07 '21 at 15:37
  • @ArthurClerc-Gherardi Although I have received com.amazonaws.SdkClientException: The requested metadata is not found on 6.4, so I have started to wonder if this library could be somewhat AWS-specific. – Grevioos May 07 '21 at 15:48
  • @Grevioos I don't think delta is built upon AWS :D, maybe you can check their github, it is opensource. When did you get this error with the SDK which is totally different than the previous ? – Arthur Clerc-Gherardi May 10 '21 at 09:28

1 Answers1

0

So the actual issue was the io.delta.connectors.spark.JDBC.ImportRunner part. I have copy pasted it from the blog but the actual path should be lowercased io.delta.connectors.spark.jdbc.ImportRunner.

Grevioos
  • 355
  • 5
  • 30