Databricks - transfer data from one databricks workspace to another

Question

How can I transform my data in databricks workspace 1 (DBW1) and then push it (send/save the table) to another databricks workspace (DBW2)?

On the DBW1 I installed this JDBC driver.

Then I tried:

(df.write
 .format("jdbc")
 .options(
   url="jdbc:spark://<DBW2-url>:443/default;transportMode=http;ssl=1;httpPath=<http-path-of-cluster>;AuthMech=3;UID=<uid>;PWD=<pat>",
   driver="com.simba.spark.jdbc.Driver",
   dbtable="default.fromDBW1"
 )
 .save()
)

However, when I run it I get:

java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException:

How to do this correctly?

Note: each DBW is in different subscription.

Alex Ott · Accepted Answer · 2022-01-17T16:23:53.993

2

From my point of view, the more scalable way would be to write directly into ADLS instead of using JDBC. But this needs to be done as following:

You need to have a separate storage account for your data. Anyway, use of DBFS Root for storage of the actual data isn't recommended as it's not accessible from outside - that makes things, like, migration, more complicated.
You need to have a way to access that storage account (ADLS or Blob storage). You can use access data directly (via abfss:// or wasbs:// URLs)
In the target workspace you just create a table for your data written - so called unmanaged table. Just do (see doc):

create table <name>
using delta
location 'path_or_url_to data'

edited Jan 17 '22 at 16:23

answered Jan 17 '22 at 16:15

Alex Ott

80,552
8
87
132

Yeah, this I consider more as a backup approach because of that 3rd step - you have to copy the data and then connect to the second workspace and run a notebook to actually create these tables - thats why I wanted to copy it directly via JDBC. What do you think? – romanzdk Jan 17 '22 at 16:17
1

You don’t have to copy anything. Both workspaces can read the same files. – David Browne - Microsoft Jan 17 '22 at 16:21
Agree with David - you just run `create table` once - I've added example – Alex Ott Jan 17 '22 at 16:23
The intention was to transform the data in subscription A (with DBW1 and corresponding ADLS) and then push the transformed data into subscription B (with DBW2 and corresponding ADLS). With JDBC the database tables inside DBW2 would have been created straight away. When using your approach there is one more step - to actually create those database tables - exactly as your last step describes. And in order to be able to create the tables, I would need to create a notebook task for that (to be able to automatize it). – romanzdk Jan 17 '22 at 19:03
So I would need to 1. copy the data to ADLS and then 2. run the notebook task in DBW2 to create the tables. – romanzdk Jan 17 '22 at 19:03

Databricks - transfer data from one databricks workspace to another

1 Answers1