0

My assumptions where MDF might be right fit are as follows:

  1. MDF can be used as a Data Wrangling Tool by end-users

  2. MDF is better suited for SQL Server-based Datawarehouse architectures to load the data into staging or data lake in clean format (prepare the data before loading it to SQL Server DWH and then use a proper ETL tool to do transformations)

  3. If MDF has to be used for light ELT / ETL tasks directly on Data Lake or DWH, it needs customization for complex transformations...

My question would be:

A) Did anyone use Mapping Data Flow in production for option 2 and 3 above?

B) If assumption 3 is valid, would you suggest going for Spark-based transformation or an ETL tool rather than patching the MDF with customizations as new versions might not be compatible with, etc..

Cengiz
  • 303
  • 2
  • 9

1 Answers1

1

I disagree with most of your assumptions. Data Flow is a part of a larger ETL environment, either Data Factory (ADF) or Azure Synapse Pipelines and you really can't separate it from it's host. Data Flow is a UI code generator that executes at runtime as a Spark job. If your end user is a data engineer, then yes Data Flow is a good tool for them.

ADF is a great service for orchestrating data operations. ADF supports all the things you mentioned (SSIS, Notebooks, Stored Procedures, and many more). It also supports Data Flow, which is absolutely a "proper" tool for transformations and has a very rich feature set. In fact, if you are NOT doing transformations, Data Flow is likely overkill for your solution.

Joel Cochran
  • 7,139
  • 2
  • 30
  • 43
  • ADF can perform light ETL/ELT jobs where SSIS is used to support complex requirements. ADF with Data Flow is an alternative to SSIS (but it is not there yet to replace SSIS). Do you suggest that it can perform any transformation (small, complex) and what is the right approach for customizations? How the customization will be supported with new versions? – Cengiz Mar 28 '22 at 21:45