0

I created an Azure Mapping Data Flow with Aggregate transformation that shows in the first row of the diagram below. Then I added a new branch to the Employee source, and then new data source Department. But, as shown in figure 2 below, when I try to join branched data source Employee with the department data source, I was expecting the join transformation to appear in between branched data source Employee and the department. But instead, the join appears next to the first data source Employee on top, and the Aggregate transform along with the Sink data source move down to second row (as shown in figure 3 below). Question: Why this is happening and how can I move the Join transformation between the branched data source (Employee) and the Department?

Employee Source with its Branch (below it) and Aggregate and Sink next to it:

enter image description here

Adding Join transformation between Branched Employee source and Department (hoping the Join will appear between Branched Employee source and Department):

enter image description here

But, instead following happened: Why? Note: I can still join the Employee and Department by it will look ugly that left part of the join is on top, and the right part (Department) of the join is below the second row.

enter image description here

Expected Graph [taken from another document on Data Factory]: Something like this I wanted

enter image description here

nam
  • 21,967
  • 37
  • 158
  • 332
  • I would observe the same behavior when I was building a data flow, but could you not just name the block to make it obvious it's a join between the two? Consider this, what if there was another data source between employee and department, what would you expect it to do then? – Omar Abdel Bari Dec 31 '21 at 23:46
  • I think for this to work it would only make sense as a general rule if they overhauled the UI to use a graph editor that isn't restricted from left to right (as opposed to the current one). – Omar Abdel Bari Dec 31 '21 at 23:47
  • Open the script behind, move the Join line in the script to above the aggregate. I put an example in my answer below. – Mark Kromer MSFT Jan 02 '22 at 21:46

1 Answers1

1

Mapping Data Flows is not a free-flow diagram. It is a construction graph that will automatically adjust the node positions, connecting lines, and reference nodes for you to optimize space in the graph. Your diagrams above are expressing the exact same semantic to ADF. The only difference is that the UI is moving the Join node to the top stream. If you look at the script behind that data flow, you'll note that ADF sees the original Employee source and the new branch of Employee as the same object. If you'd like to influence the placement of the streams in your graph, open the script behind your flow and you can swap the lines something like this:

Employee aggregate(newfield = sum(1)) ~> Aggregate1 Employee, Department join(surrogatekey == movie, joinType:'inner', broadcast: 'auto')~> Join1

... change it to this with Join on top ...

Employee, Department join(surrogatekey == movie, joinType:'inner', broadcast: 'auto')~> Join1 Employee aggregate(newfield = sum(1)) ~> Aggregate1

That will swamp the order of the streams in your flow and should give you what you are looking for.

Mark Kromer MSFT
  • 3,578
  • 1
  • 10
  • 11