0

I am exploring option to get data lineage information out from Spark Logs for Spark programs.

I am looking for information like which kafka topics or Tables Spark program reads or writes into, so that we can get that information run time and to build end to end flow of data movement. Has anyone explored such framework.

When I have Info log level setup, I can get information regarding input kafka read and table which data is written , however, I dont get information if data is sent out to Kafka Topic or Input table read.

Any help appreciated.

Thanks & Regards.

1 Answers1

0

I did analysis that by enabling Logging level at Debug of Spark Program. You can get detailed stack trace of events.

You can track

  1. Input read of Kafka Topic Name
  2. Which DB tables it is reading from
  3. Which DB tables it is writing to

Regards: