0

I'm using Azure Data factory, I'm using SQLServer as source and Postgres as target. Goal is to copy 30 tables from SQLServer, with transformation, to 30 tables in Postgres. Hard part is I have 80 databases from and to, all with the exact same layout but different data. Its one database per customer so 80 customers each with their own databases.

Linked Services doesn't allow parameters for Postgres. I have one dataset per source and target using parameters for schema and table names. I have one pipeline per table with SQLServer source and Postgres target. I can parameterize the SQLServer source in linked service but not Postgres Problem is how can I copy 80 source databases to 80 target databases without adding 80 target linked services and 80 target datasets? Plus I'd have to repeat all 30 pipelines per target database.

BTW I'm only familiar with the UI, however anything else that does the job is acceptable. Any help would be appreciated.

BobL
  • 1
  • 1
  • This question looks to be more suitable for [dba.se] instead. This site is for questions related to programming (code, including SQL) or use of a programmer's tool. You can find more information about this site in the [help]. – Ken White Jan 21 '22 at 00:43
  • Yes, there is not sink/table destination for postgres – Oki Kuswanda Jan 22 '22 at 17:50
  • Yes there is a sink destination for postgres (single server) but not for postgres flexi server – BobL Jan 22 '22 at 22:14
  • This is not a DB question. Its an Azure Data Factory question about how to parameterize for multiple DBs. – BobL Jan 22 '22 at 22:16

1 Answers1

0

There is simple way to implement this. Essentially you need to have a single Linked Service, which reads the connection string out of KeyVault. You can then parameterize source and target as keyvault secret names, and easily switch between data sources by just changing the secret name. This relies on all connection related information being enclosed within a single connection string.

I will provide a simple overview for Postgresql, but the same logic applies to MSSQL servers as source.

  1. Implement a Linked Service for Azure Key Vault.
  2. Add a Linked Service for Azure Postgresql that uses Key Vault to store access url in format: Server=your_server_name.postgres.database.azure.com;Database=your_database_name;Port=5432;UID=your_user_name;Password=your_password;SSL Mode=Require;Keepalive=600; (advise to use server name as secret name)
  3. Pass this parameter, which is essentially correct secret name, in the Pipeline (you can also implement a loop that would accept immediately array of x elements, and parse n elements at a time into separate pipeline)

Linked Service Definition for KeyVault:

{
    "name": "your_keyvault_name",
    "properties": {
        "description": "KeyVault",
        "annotations": [],
        "type": "AzureKeyVault",
        "typeProperties": {
            "baseUrl": "https://your_keyvault_name.vault.azure.net/"
        }
    }
}

Linked Service Definition for Postgresql:

{   "name": "generic_postgres_service".
    "properties": {
        "type": "AzurePostgreSql",
        "parameters": {
            "pg_database": {
                "type": "string",
                "defaultValue": "your_database_name"
            }
        },
        "annotations": [],
        "typeProperties": {
            "connectionString": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "KeyVaultName",
                    "type": "LinkedServiceReference"
                },
                "secretName": "@linkedService().secret_name_for_server"
            }
        },
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset Definition for Postgresql:

{
    "name": "your_postgresql_dataset",
    "properties": {
        "linkedServiceName": {
            "referenceName": "generic_postgres_service",
            "type": "LinkedServiceReference",
            "parameters": {
                "secret_name_for_server": {
                    "value": "@dataset().secret_name_for_server",
                    "type": "Expression"
                }
            }
        },
        "parameters": {
            "secret_name_for_server": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "AzurePostgreSqlTable",
        "schema": [],
        "typeProperties": {
            "schema": {
                "value": "@dataset().schema_name",
                "type": "Expression"
            },
            "table": {
                "value": "@dataset().table_name",
                "type": "Expression"
            }
        }
    }
}

Pipeline Definition for Postgresql:

{
    "name": "your_postgres_pipeline",
    "properties": {
        "activities": [
            {
                "name": "Copy_Activity_1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                ...
                ... i skipped definition
                ...
                "inputs": [
                    {
                        "referenceName": "your_postgresql_dataset",
                        "type": "DatasetReference",
                        "parameters": {
                            "secret_name_for_server": "secret_name"
                        }
                    }
                ]
            }
        ],
        "annotations": []
    }
}
Cninroh
  • 1,796
  • 2
  • 21
  • 37