I have the same workflow on two different environments. To validate that both workflows are identical, I feed the same input data to both workflows. If they are identical, I am expecting the output dataset of each workflow to be same.
In this requirement, I cannot alter the workflow in any way (add/remove DAG's etc.).
Which tool is best suited for this use case? I was reading up on data validation frameworks like Apache Griffin and Great Expectations. Can either of this be used for this use case? Or is there a simpler alternative?
Update: I forgot to mention that I want the validation process to be as non interactive as possible. Reading the Great Expectations tutorial, it talks about manually opening & running Jupyter notebooks and I want to minimize processes like this as much as possible. If that makes sense.
Update 2:
Dataset produced by workflow in first environment:
Name | Value |
---|---|
ABC | 10 |
DEF | 20 |
Dataset produced by workflow in second environment:
Name | Value |
---|---|
DEF | 20 |
ABC | 10 |
After running validation, I want the output to be that both datasets are identical even though they are in a different order.