-1

Situation:

I have two databases DB1 and DB2. DB1 has 9 tables (out of these 9, any of the tables could get updated with only column to track that change i.e TIME_STAMP column).

Now I am trying to bring only the Updated/New records from these 9 tables from DB1 to DB2 (each table in DB1 map to one table in DB2) using Azure Data Factory and then run my final query on the tables in DB2. I have only read access on DB1.

Logic:

  1. Find the set of the Ids using UNION which got updated/inserted in these 9 tables
  2. Inner join the result of the UNION query with each of the 9 tables individually

Union Query Example:

SELECT DISTINCT idcolumn 
FROM table1
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z' 

UNION 

SELECT DISTINCT idcolumn 
FROM table2
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z'

UNION 

SELECT DISTINCT idcolumn 
FROM table3
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z'

UNION 
    
SELECT DISTINCT idcolumn 
FROM table4
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z' 

And similarly for 5 other tables.

Problem:

Is there a way in ADF with which I can find the union query result only once and then join the resultant data with all the 9 tables in one go rather than running the union query 9 times with each table in 9 different copy activity?

Prashant Babber
  • 431
  • 4
  • 6
  • 1
    Your question is not clear, where are the 9 tables, all in DB1? How did you get the IDs? If the tables are not all in DB1 and you only the read permission on DB1, you could not read the data in DB2. – Leon Yue Oct 12 '20 at 00:19
  • please check, I have updated the question – Prashant Babber Oct 16 '20 at 05:54
  • did you run the query in Source `query` option? – Leon Yue Oct 16 '20 at 05:59
  • yes inside ADF copy activity I am running it in source query and my question is if I can use the output of this query in subsequent activities to do an inner join with tables in DB1 – Prashant Babber Oct 16 '20 at 13:44
  • 1
    yes, you can do that in Data flow. Data flow source support query and has join active. – Leon Yue Oct 17 '20 at 14:12
  • Is it possible to do it with ADF pipelines without Data Flow? also, if you think the question is clear now, can you please open it again? – Prashant Babber Oct 21 '20 at 04:03
  • 1
    Just with Copy active, it's impossible. The question is closed by others. Do you need I post my comment as answer? I will try to reopen it. – Leon Yue Oct 21 '20 at 05:46
  • 1
    Hi @Prashant Babber, your question is reopened now. If my answer is helpful for you, hope you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you. – Leon Yue Oct 22 '20 at 01:43

1 Answers1

2

You can achieve that with Data Flow. For example,

  1. Create a Source 1: run the query to get source dataset.
  2. Create Source 2.
  3. Add a Join active to join with Source 1 and Source 2.

Here's the data flow overview: enter image description here

Just with Copy active in ADF pipeline, it's impossible. We can not join the A Copy active source to B copy active's source.

Leon Yue
  • 15,693
  • 1
  • 11
  • 23