Questions tagged [spline-data-lineage-tracker]

10 questions
3
votes
2 answers

Error enabling lineage in spark using spline?

I tried using spline to track the lineage in spark using both ways specified here But both of them failed with same error ERROR QueryExecutionEventHandlerFactory: Spline Initialization Failed! Spark lineage tracking is disabled Spark Agent was not…
Shubham Jain
  • 5,327
  • 2
  • 15
  • 38
3
votes
2 answers

Errors while installing Spline (Data Lineage Tool for Spark)

I am trying to install Apache Spline in Windows. My Spark version is 2.4.0 Scala version is 2.12.0 I am following the steps mentioned here https://absaoss.github.io/spline/ I ran the docker-compose command and the UI is up wget…
Ayan Biswas
  • 1,641
  • 9
  • 39
  • 66
1
vote
1 answer

Cant view lineage using UI for spline

I have tried everything, the code even writes the data. But spline is unable to pick it up. My code runs sucessfully but there is no data in spline UI. Spark - 3.3.1 Scala - 2.12.18 Python - 3.9.6 Spline agent - 1.1.0 Can someone guide me in…
1
vote
1 answer

Unable to view the pyspark job in Spline using ec2 instance

We created a sample pyspark job and gave the spark-submit commands as following in ec2 instance sudo ./bin/spark-submit --packages za.co.absa.spline.agent.spark:spark-3.1-spline-agent-bundle_2.12:0.6.1 --conf…
1
vote
1 answer

Finding spark pipeline start time from spline lineage

Im exploring spline to determine how much time it took for spark to execute a pipeline (from initialising spark context till writing the result). I could see "timestamp":1611397050192 in the Spline lineage file which is actually write time. Is…
syv
  • 3,528
  • 7
  • 35
  • 50
0
votes
1 answer

Unable to use Spline Lineage with AWS Glue 4.0 | Failure

I'm trying to capture the Lineage of a PySpark job using Spline in AWS Glue that does transformations using DataFrame APIs and then writes the output in S3 as Delta tables. For now, I want the lineage on the console, but end state I want to capture…
0
votes
1 answer

Spline, pyspark: How to get spline console output in my python code?

In my pyspark code im reading test csv file, filtering it, and writing. All that actions i can see in console with LoggingLineageDispatcher in json format, but i want to find a way to get this data right in my python code. Cant find any options for…
Andrej Vilenskij
  • 487
  • 1
  • 7
  • 23
0
votes
1 answer

Azure Databricks: trying to run Spline for capturing Spark lineage?

I am trying to set up Spline in Azure Databricks but facing this issue, any help regarding this? :6: error: identifier expected but double literal found. --packages za.co.absa.spline.agent.spark:spark-3.0-spline-agent-bundle_2.12:0.6.1…
ShadowWarrior
  • 180
  • 1
  • 12
0
votes
1 answer

Need to Re-Write Scala Code for Specific JSON Output

I am trying to register Databricks notebook lineage to Azure Purview through spline and apacheatlas api. There are two versions of the code: 1) is the original code which uses databricks runtime version 6.4 and is working as expected but we need to…
0
votes
1 answer

spline spark agent jar has errors during post processing

I have been trying to run the following code with the new spline jsr: za.co.absa.spline.agent.spark:spark-3.0-spline-agent-bundle_2.12:0.6.0 but have been getting errors specific to UserExtraMetadataProvider which has been deprecated in the newer…