1

I'm looking for a way to be able to connect to Databricks deltalake tables from ADF and other Azure Services(like Data Catalog). I don't see databricks data store listed in ADF data sources.

On a similar question - Is possible to read an Azure Databricks table from Azure Data Factory?

@simon_dmorias seems to have suggested using ODBC connection to connect to databricks tables.

I tried to set up the ODBC connection but it requires IR to be setup. There are 2 options I see when creating the IR. Self-hosted and linked Self-hosted. I tried to create the Self-hosted IR but it requires installation on my local desktop and probably is more meant for an on-premise odbc connection. I couldn't use the IR on my linked Services.

I have been able to connect powerbi with databricks deltalake tables and plan to use the same creds here. Here is the reference link -

https://docs.azuredatabricks.net/user-guide/bi/power-bi.html

Any guidance will be helpful

Mauryas
  • 41
  • 2
  • 8

3 Answers3

1

You can but it is quite complex. You need to use the ODBC connector in Azure Data Factory with a self hosted runtime.

ADF can connect using ODBC (https://learn.microsoft.com/en-us/azure/data-factory/connector-odbc). It does require a self hosted IR. Assuming you have the right drivers installed you can configure the ODBC connection to a Databricks cluster.

The connections details for the ODBC settings can be found in cluster settings screen in the Databricks workspace (https://learn.microsoft.com/en-us/azure/azure-databricks/connect-databricks-excel-python-r).

The process is very similar to what you posted for PowerBI.

simon_dmorias
  • 2,343
  • 3
  • 19
  • 33
0

Actually, I figured it is possible to get metadata from any tables inside a Databricks workspace directly, by using ODBC connection available on current version of Azure Data Catalog, it would be much better a native connector, but for now if you wanna give it a try just fill up the info bellow (on the Azure Data Catalog publishing app):

Driver: Microsoft Spark ODBC Driver (it must be intalled on your system)

Connection String: host=eastus.azuredatabricks.net;port=443;SSL=1;HTTPPath=sql/protocolv1/o/XXXXXXXXXXXXXXX/XXXX-XXXXXX-XXXXXX;transportMode=http;AuthMech=8

User: token

Password: dapiXXXXXXXXXXXXXXXXXXXXX

And let Database field blank

Ronieri Marques
  • 379
  • 3
  • 7
-1

Please refer to the section Azure Data Factory of Azure Databricks offical document User Guide > Developer Tools > Managing Dependencies in Data Pipelines. And you will see there are two Azure documents list in the topic about how to create a Databricks notebook with the Databricks Notebook Activity and run it to do the transfer data task in Azure Data Factory, as below. I think it will help you to realize your needs.

  1. Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory
  2. Transform data by running a Databricks notebook
Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • Hi Peter. My ask is primarily to be able to connect to databricks hive tables from ADF just like one connects to a sql database table. I have been working with databricks notebooks execution from ADF and thats isnt really a problem. – Mauryas Sep 13 '19 at 13:02