1

How can I transform my data in databricks workspace 1 (DBW1) and then push it (send/save the table) to another databricks workspace (DBW2)?

On the DBW1 I installed this JDBC driver.

Then I tried:

(df.write
 .format("jdbc")
 .options(
   url="jdbc:spark://<DBW2-url>:443/default;transportMode=http;ssl=1;httpPath=<http-path-of-cluster>;AuthMech=3;UID=<uid>;PWD=<pat>",
   driver="com.simba.spark.jdbc.Driver",
   dbtable="default.fromDBW1"
 )
 .save()
)

However, when I run it I get:

java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 

How to do this correctly?

Note: each DBW is in different subscription.

romanzdk
  • 930
  • 11
  • 30

1 Answers1

2

From my point of view, the more scalable way would be to write directly into ADLS instead of using JDBC. But this needs to be done as following:

  • You need to have a separate storage account for your data. Anyway, use of DBFS Root for storage of the actual data isn't recommended as it's not accessible from outside - that makes things, like, migration, more complicated.

  • You need to have a way to access that storage account (ADLS or Blob storage). You can use access data directly (via abfss:// or wasbs:// URLs)

  • In the target workspace you just create a table for your data written - so called unmanaged table. Just do (see doc):

create table <name>
using delta
location 'path_or_url_to data'
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Yeah, this I consider more as a backup approach because of that 3rd step - you have to copy the data and then connect to the second workspace and run a notebook to actually create these tables - thats why I wanted to copy it directly via JDBC. What do you think? – romanzdk Jan 17 '22 at 16:17
  • 1
    You don’t have to copy anything. Both workspaces can read the same files. – David Browne - Microsoft Jan 17 '22 at 16:21
  • Agree with David - you just run `create table` once - I've added example – Alex Ott Jan 17 '22 at 16:23
  • The intention was to transform the data in subscription A (with DBW1 and corresponding ADLS) and then push the transformed data into subscription B (with DBW2 and corresponding ADLS). With JDBC the database tables inside DBW2 would have been created straight away. When using your approach there is one more step - to actually create those database tables - exactly as your last step describes. And in order to be able to create the tables, I would need to create a notebook task for that (to be able to automatize it). – romanzdk Jan 17 '22 at 19:03
  • So I would need to 1. copy the data to ADLS and then 2. run the notebook task in DBW2 to create the tables. – romanzdk Jan 17 '22 at 19:03