2

We are currently creating a new Azure Synapse Analytics environment to house our data. For this, we need to connect from Synapse to our on-premise SQL Server databases.

These databases are not reachable from the open internet, they are only accessible from within our own network. For our other applications on Azure, we have setup a hub-spoke model network, where the hub connects to our internal network, and VNets in the spoke are peered to the hub. This way we have been able to connect to on-premise database servers in other applications like AKS.

However, this doesn't seem to work for Azure Synapse. So how to do this then?

What I tried so far

The Azure Data Factory docs contain the following page outlining a guide for making this connection, using a private link service, a load balancer and a couple of Virtual Machines configured to forward traffic on specific ports towards the server using iptables.

I have implemented this solution, but I don't really trust this. Most of all, the iptables configuration doesn't seem to persist when a VM reboot. This doesn't inspire a lot of trust in this solution, since this doesn't feel like something that should be used in production as is.

At the same time, I can't imagine that connecting to an on-premise database server is such a stretch, that there are no better solutions around. We can't be the only one stumbling onto this problem.

So my question, is there really no other, better, way to set up this connection?

JAD
  • 2,035
  • 4
  • 21
  • 35
  • `iptables` doesn't persist by default. Look into the `iptables-save` manpages for your distro to figure out where to persist the IPv4 and/or IPv6 rules after modifying them. – AlwaysLearning May 03 '22 at 12:50
  • @AlwaysLearning I have cobbled together a way to make these configs more persistent, but I would very much prefer a way to do this without having to fiddle with these kinds of settings in vms. – JAD May 03 '22 at 12:59
  • 2
    Have you seen this video..https://howtoanalytics.info/2021/02/17/integration-with-on-premises-data-sources-in-azure-synapse-analytics/ – Anand Sowmithiran May 03 '22 at 17:43
  • 1
    It explains about using `self hosted` integration runtime – Anand Sowmithiran May 03 '22 at 17:51
  • @AnandSowmithiran that looks promising. I'll investigate that. Thanks! – JAD May 04 '22 at 08:23
  • you can refer this document [Connect a local SQL Server database to Azure Synapse](https://www.kevinrchant.com/2021/06/28/connect-a-local-sql-server-database-to-azure-synapse/) – Pratik Lad May 05 '22 at 11:22

2 Answers2

4

Assuming your on-premises SQL Server running on Windows machine, you can go with Self-hosted Integration Runtime (IR) to connect Azure Synapse Analytics with on-premises SQL Server.

A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network. The installation of a self-hosted integration runtime needs an on-premises machine or a virtual machine inside a private network.

The considerations for using the self-hosted IR is mentioned here.

You can create and configure the self-hosted IR either through PowerShell or UI. Though UI is the easiest method to deploy it. You can refer the steps given in Microsoft official document to do the same using UI.

Utkarsh Pal
  • 4,079
  • 1
  • 5
  • 14
  • We ended up getting this to work. However, it should be noted that this severely limits the features available in Synapse. You cannot connect direct to the on-premise datasources using Spark, nor can you use the data pipeline tools. From what I could tell, the only way really is to do a (close to) verbatim copy into the data lake, and then do the other processing from there. – JAD Jun 13 '22 at 07:02
2

We have the same problem's, and basically we only found 3 ways to copy data from on-premises SQL server to Synapse:

  1. In theory you could open all Microsoft Synapse public IP's in your on-premises firewall, then you can use Azure IR (It will be open to every Azure Synapse in the same region). List of IP addresses
  2. Install Self-Hosted IR in a on-premises computer
  3. Install a Self-Hosted IR in a VM in Azure which is in a VNET that have access to your on-premises network (VPN/Express Route)