I use Spark for data processing, but starting from the datasources (mostly csv files) I would like to put in place a data-pipeline which has right stages to control/test/manipulate data and deploy them to different "stages" (CI-CD/QA/UAT/LIVE/etc).
Is there any valid data-pipeline "blueprint" for it?