1

I use Spark for data processing, but starting from the datasources (mostly csv files) I would like to put in place a data-pipeline which has right stages to control/test/manipulate data and deploy them to different "stages" (CI-CD/QA/UAT/LIVE/etc).

Is there any valid data-pipeline "blueprint" for it?

Randomize
  • 8,651
  • 18
  • 78
  • 133

0 Answers0